Gryphyn Media

News & Announcements

20th June 2007

Trouble with Gryphynmedia.com domain email

Filed under: — Tracy @ 9:36 pm

We are experiencing a network issue this evening, causing some email to gryphynmedia.com address to be rejected with a “550 Error.” This affects only gryphynmedia.com addresses, such as support@gryphynmedia.com. Please use the helpdesk to contact us while the problem is resolved.

Update: The problem was found and fixed - not network; it was a configuration problem. A cpanel update messed up the DNS recursion, which breaks email, since the server can’t look up any hostnames. We apologize for any trouble contacting us. Email is now delivering normally.

22nd May 2007

Champion attacked by phishing script

Filed under: — Tracy @ 6:08 pm

Email and server speed are being affected on Champion by a script sending botnet phishing emails. Over 15K messages went out in the past hour, mostly to invalid hotmail addresses, creating a huge number of bounces and tying up Exim. Several people are working on identifying the PHP script - not always easy. We expect normal server speed to be restored shortly.

Update: 8:55pm - Server speed is improved and the phishing process has been killed. We are still investigated the source of the script.

27th March 2007

Attack on domain on Champion

Filed under: — Tracy @ 3:28 pm

A website on Champion is being attacked with large amounts of web traffic and email. It is not clear why anyone would attack on an otherwise ordinary real estate site. We are blocking what we can, but the attack was causing some connection trouble earlier this afternoon. It appears to be dying off. We are watching, and consulting with a security expert.

SSL trouble on Crown

Filed under: — Tracy @ 3:20 pm

Last night’s cpanel update on Crown apparently caused SSL to freeze on Crown. The problem didn’t rigger any monitors on our end, so unfortuantely we did not know about it until tickets came in. Restarting Apache appears to have resolved the problem.

23rd March 2007

Cpanel restarted on Champion

Filed under: — Tracy @ 12:12 pm

A larger-than-ususal lunchtime email surge choked cpanel on the Champion server, and it froze for about half an hour. The server itself did not go down, so websites stayed up, but email was sluggish and you would not have been able to log into webmail or cpanel. It has been restarted and things have returned to normal.

21st March 2007

Bumpy Apache update on Crown

Filed under: — Tracy @ 1:44 am

HTTP may appear to be up and down in the wee hours as we work on a problem with PHP modules.

21st February 2007

Deluxe users moved to a new server

Filed under: — Tracy @ 9:13 am

All Deluxe users were moved to a new server at about 2:30 am Eastern. No client IPs changed, nothing should seem different to anyone - except that the server will stay up!

20th February 2007

Deluxe down again

Filed under: — Tracy @ 3:17 pm

Deluxe slowly ground to a halt over the past hour, then the whole node went down at the datacenter. They have engineers working on it - we don’t know yet if this is a recurrance of the problem from yesterday morning, but we think it is likely. We will continue to post as we get news.

Back up at 3:35pm Eastern, after an hour and a half. This is not acceptable to us (nor, we are sure, to you). We are pressing to have the server replaced, since they cannot give us a definitive explanation for the long outages. This server’s performance was great, until yesterday.

19th February 2007

Deluxe server down

Filed under: — Tracy @ 5:46 am

The Deluxe server is down with hardware problems at the datacenter, since 5:14 AM Eastern. Engineers are working on it. We apologize for the outage and will report back with news as soon as we know more.

The server came back up at 9:10 AM, after almost four hours of down time. We are still waiting for details on the hardware issue. Parts and entire replacement servers are stocked onsite to keep hardware outages to a minimum.

2nd February 2007

Champion email laboring again

Filed under: — Tracy @ 12:20 pm

We seem to be getting hit mid-morning Eastern every day this week with a wave of botnet spam. We are watching for it, and working on blocking it, but there is no simple solution. This is a growing problem in the whole industry. We are trying to identify accounts that may be unusually vulnerable to it, and we are contacting some of you are tweaking your spam, forwarding, and other email settings.

30th January 2007

Champion server: possible botnet attack

Filed under: — Tracy @ 10:44 am

The Champion server is possibly experiencing a botnet attack. We are working hard to get it under control and block the sources. More news as soon as possible.

Update: It wasn’t a botnet DDOS attack, but it was huge spike of botnet spam, mostly handled by the RBL filters, but the server was laboring mightily, causing it to fail and be restarted several times. This botnet stuff is really serious, and most people are completely unaware. We frequently have clients complain about the rising level of spam, and we try to explain the botnet issue:

It’s almost as if Windows shipped with bundled botnet software: Of the approximately 600 million computers connected to the Internet, 150 million are likely participants in a botnet (of these, half are apparently in China). This according to Vint “founding father of the Internet” Cerf, who warned attendees of the World Economic Forum in Davos, Switzerland, last week that these networks of compromised PCs pose a serious threat to the stability of the Internet. Cerf likened the situation to a pandemic, and if his estimate is accurate, that’s certainly a good word for it. One quarter of the machines connected to the Net, infected. “With new levels of sophistication this has reached a real milestone,” Mark Sunner, chief security analyst at MessageLabs, told News.com. “Botnets are getting smaller, more stealthy and more discreet and yet the volumes of spam are going up. Without a hint of scaremongering, will this get a lot worse throughout 2007 in terms of botnet sending? Absolutely, yes.”

Botnet army massing in China: China is now the most infected country in the world and Asia contains half the world’s infected computers. Countries in Asia account for five of the six most infected territories around the world, with the US in second place.

Criminals ‘may overwhelm the web’:
Criminals controlling millions of personal computers are threatening the internet’s future, experts have warned.

13th January 2007

Verizon email delivery touble on all servers

Filed under: — Tracy @ 9:31 am

We have reports of Verizon email delivery problems on all servers. Attempts to email Verizon users result in “450 Requested action not taken - try later” errors.” Verizon says we are not blacklisted, but their security group is not able to tell us what else might be going on. We are working with our engineers and other consultants in the hosting industry to resolve the problem. We are acutely aware that there seems to be no upper-tier support at Verizon on weekends, and that this is a 3-day holiday weekend. We have asked a few Verizon customars to try calling support, but they are just told that there is no problem at Verizon. If anyone has an insider contact in Verizon support, they are asked to open a helpdesk ticket and tell us about it.

Solved! We finally heard back from Verizon with an explanation of what was going on. Their new sender verification process was causing the hold-up. We made some changes to our security (which has been in place for years), and now Verizon email seems to be delivering without delay. We wish it could have been faster, but we needed Verizon to answer us. Apparently, they have no weekend support.

4th January 2007

Deluxe Server Emergency Move

Filed under: — Tracy @ 4:48 pm

A developing problem with the drives on the Deluxe box made it necessary to quickly migrate the entire server to a new box. The move starteda t 4:05 PM Eastern and was done at 4:35 PM. All IPs, email, databases, etc were moved intact. No one should experience any difference. Please open a helpdesk ticket if you see an ongoing problem.

19th December 2006

Champion down

Filed under: — Tracy @ 11:37 pm

Champion suddenly crashed at 11:30 PM Eastern. We are working on getting it back up and finding the cause.

Server and email back up, but still not quite right. Looking for the problem.

Apparently, BIND started running a lot of processes and locked up. Things look normal now. We regret any inconvenience.

29th November 2006

Local Comcast nework problem

Filed under: — Tracy @ 11:29 pm

A number of customers reported that they could not get to their websites starting at about 10:20 PM Eastern. All the affected clients are in the Philadelphia area. Since nothing is wrong at the server, and the support techs could see all the sites, we suspect it is a local network issue, possibly involving Comcast.

Update: Everyone having a problem reports that they use Comcast. We are able to verify independently that the sites are up and email is being received. Other people can see your sites, and send you email, and the email will be waiting for you when the problem clears.

Update: Some people report that the trouble started as earler as 7 PM. We were not able to verify the problem with Comcast, but it was definately limited to Philly-area Comcast users. One pair of users that live near each other could see each others’ sites but not their own. We believe it has resolved now, for everyone, but if you are still having trouble, open a helpdesk ticket and we will make sure you are not having some other problem.

27th November 2006

Crown server crashed

Filed under: — Tracy @ 1:12 pm

The Crown server just crashed and we are investigating.

Update: A very large email attachment made Exim choke. The server is now back to normal.

21st November 2006

Champion server sluggish

Filed under: — Tracy @ 10:10 am

The Champion server is running higher loads this morning, making it seem sluggish. We have identified the problem script and should have it cleaned up shortly.

Update: All fixed. A process was frozen.

19th October 2006

Heavy Web Traffic on Champion

Filed under: — Tracy @ 1:57 pm

The Champion server is seeing heavy web traffic right now, possibly because it hosts a website involved in a controversy that hit the news today. We are taking steps to reduce the load and possibly move the controversial site to another server. Thank you for your patience - you will get the same help and attention if your site becomes a focus of attention.

Update: It was not the controversial site (hint: cheesesteak) - it is an election-related site that may be under attack.

23rd August 2006

Champion experiencing heavy server load

Filed under: — Tracy @ 4:05 pm

We are investigating the problem. It is likely that users will not be able to get email right now. We suspect an exploited script or email form.

Update: We have the server back up and are still investigating the problem that caused it. It was a week ago that we had the extended outage, and this may be related, giving us more clues to the real source of the resource usage spike.

17th August 2006

Champion Cpanel Outage Wednesday

Filed under: — Tracy @ 12:59 am

There was a significant cpanel outage on the Champion server while I (Tracy) was out of phone and internet range Wednesday. The server was up, but cpanel was not. That disabled many services, while still “reporting” to the remote monitors that the server was up. It took a long time for someone to notice the issue and restart cpanel. That is NOT what was supposed to happen when I was out of contact; helpdesk coverage was arranged well in advance. It is unclear why helpdesk tickets were not read and answered, which would quickly have caused a technician to check the server. We will be investigating, and making individual responses Thursday AM Eastern, when I can get back to a real broadband connection and call more people for reports. We take this incident very seriously and apologize for the terrible inconvenience suffered by many of you.

Update Thursday Noon: The original problem was caused when the server resource limit was greatly exceeded yesterday. TOday, we continue to see surges of resource useage. We suspect that someone’s PHP-pwered website has either been expolited or is seeing a large sudden jump in activity. That is harder to identify than you might think… PHP scripts on this server all run as user “nobody.” We will be considering changing that policy in the future, a discussion for another day. But you may experience short periods of difficulty connecting to cpanel and email today, until we nail down the problem.

The other issue was the failure at the helddesk yesterday, resulting in a huge backlog of unaswered tickets, email, and voicemail. There was a misunderstanding about helpdesk coverage with a person we hired to help cover support while I (Tracy) am on vacation. That person and another have been fired. I am doing live support coverage today from DisnetWorld until my next shift support supervisor comes on. Believe me, I am as unhappy as those of you that experienced the outage. The vacation was planned in March, and I believed my support arrangements were in good shape when I left for a week of much-needed R&R. Obvious, I was wrong and I am profoundly sorry.

New support systems are about to be launched. See further announcements on this blog today.

9th August 2006

Brief email outage on Champion

Filed under: — Tracy @ 1:09 pm

We had an email outage of a few minutes on the Champion server. Someone tied to mail a very large message to over 400 people, and the mail server choked. It has been cleared now.

5th August 2006

Deluxe Server crashed

Filed under: — Tracy @ 12:36 pm

Something just happened on the Deluxe server. Engineers are investigating now and we will report ASAP.

Update: The problem is at the datacenter; all the servers on that node crashed. We will post more news as soon as we have it.

15th June 2006

Crown is down

Filed under: — Tracy @ 5:00 pm

We are talking to the datacenter… we are hoping that this is a result of the accelerated migration process, not an electrical faliure. More news as soon as we have it.

Update: 25 minutes down. The server copy was already done when this occured, and is being loaded to a new server right now… the old server is being revived, so that we can do the sync. Obviously, the late-night migration plan is history. We DO have back-ups, no matter what the outcome… but this is defiantely a bad day for all of us.

Update: 33 minutes down. We are at 75% of the sync process. The new server should come up within 15 minutes, likely less. We will be madly checking accounts. DO check your accounts and let us know about any problems you see.

Update: UP! 55 minutes down… much more than expected. Please check your accounts.

Emergency hardware upgrade on CROWN

Filed under: — Tracy @ 4:37 pm

Users on Crown should have received emails and phone calls (or will in the next few minutes). We have a serious hardware risk to the Crown server. The rack it is on at the datacenter was physically damaged.

The Crown server itself not damaged, but the electrical maintenance to the rack could cause unpredictable downtime… we don’t want that. We have agreed to be moved off the rack in a more orderly manner. As a bonus, we will get bigger faster AMD Opteron processors. To minimize downtime, the server will be copied to a new server, instead of being physically moved.

That means we are moving NOW, tonight. There will be brief downtime, later tonight, about 15 minutes, maybe less. We felt this was the BEST thing we can do to fix it quickly, with certainty, and with minimized downtime.

– We are copying all of the files for every account, and transferring them to the new server. Then, we will sync the old server with the new one, updating anything that has changed during the copying process. THAT is when the downtime will occur, during the sync.

– Nothing will change but the hardware and some server management software… your accounts, nameservers and IPs will remain the same. Nothing will be lost… we have backups of everything.

– We are making every effort to have the downtime occur late at night, US business traffic is lowest. But it could happen FASTER, depending on the time it takes to copy the files. We cannot be absolutely certain. If we had a choice, we would never has chosen this timing.

We are VERY sorry for the disruption. We are doing everything we can keep it from being a big disruption.
We would try to wait for Friday night, but doing it tonight means there will be top-tier engineers in the datacenter tonight and tomorrow, instead of the smaller weekend crew. I know that some of you will want us to have tried to wait, but the risk of major downtime in the middle of the Friday workday was too scary.

IF YOU NEED HELP or have an urgent concern about this migration, call 610-724-8967 and leave a message. You will be answered by EMAIL. Or, open a helpdesk ticket.
http://gryphynmedia.com/helpdesk/

Again, we apologize for the short notice, but this was the fastest, safest, least disruptive way to deal with the problem. We appreciate your patience.

1st June 2006

AOL had email delays today

Filed under: — Tracy @ 6:28 pm

Users of AOL and Netscape email addresses likely experienced delays today. They are not related to any problem with our servers. It is an AOL problem.

26th May 2006

Apache/PHP Update Sun May 28th at 1 AM EDT

Filed under: — Tracy @ 8:17 am

We are planning to carry out an Apache update on all servers at about 1 AM EDT this Sunday 5/28/06. We do not anticipate any downtime, but the servers may be sluggish at times until the update is complete. We will also upgrade PHP from 4.4.1 to 4.4.2, which should have minimal impact. (We are not yet upgrading to PHP5, which we know will break some client scripts…. that will require much more preparation.)

17th May 2006

Early Warning: Hardware upgrades coming

Filed under: — Tracy @ 11:45 am

Early Warning: We are planning hardware upgrades for all of our servers in the next few months. We will be moving from Intel Pentium4’s and Xeon’s to faster AMD Opteron servers. The datacenter that houses most of our shared users has offered us a good deal and a lot of support for the upgrade.

The upgrade will involve copying everything to the new server, then shutting the old one down for a few minutes to sync them up. There would be a short period of downtime. We plan to do these upgrade late at night and on weekends. But there is a certain amount of guesswork involved… each server is going to take a different amount of time to copy and sync. As we plan each server move, will be notifying the users on that server. We will make every effort to have the downtime occur in the least busy time of day, both to reduce impact and to reduce the sync time.

15th May 2006

Large spam load on Champion

Filed under: — Tracy @ 1:37 pm

There have been high incoming email loads on Champion today, especially high right now. Large amounts of spam are coming in, which is straining the servers, even with filters in place. Some users will have trouble receiveing email while we try to identify sources.

Update: Loads are coming down, but still not optimal. We are contacting a few of you to discuss possible email exploits of your email programs in your homes or office. There are some users with very high outgoing email rates.

3rd May 2006

MYSQL loads high on Champion

Filed under: — Tracy @ 4:19 pm

We are investigating the cause. Loads usually drop at this time of day, not rise.

Update: It was an exploited guestbook, now fixed.

18th April 2006

Huge post-vacation email load slowing Champion

Filed under: — Tracy @ 9:28 am

Everyone went back to work today after the holiday weekend. The email load on the servers is huge, incoming and outgoing, particularly on Champion. We are watching closely for specific problems, but this is basically like the being on the Schulykill Expressway during morning rush hour. It’s all volume.

The Champion server is full of mature websites and long-time email users. The problem is not so much that the server has too many accounts, as that the accounts have grown over the years. We have upgraded this server several times and done a lot of work to ease the email load. It may be that some accounts will have to be moved, and/or that some users may have to consider changes to their email habits.

For instance, folks that send out email to 100-200 people from Outlook as an informal “list” may have to consider making it a real list with real list management tools. And people with 3 years of saved webmail may have to archive some of it. We are developing an IMAP tutorial that will help webmail users set up Thunderbird to manage archiving.

We ask your patience as we work out ways to provide the least-disruptive solutions for everyone. If you suspect that you are a user that may have to make some changes, we welcome a helpdesk ticket to start the discussion.

29th March 2006

Loads are high on CROWN server

Filed under: — Tracy @ 10:57 am

Something is driving up the server loads on Crown. We are investigating. You may have trouble connecting to email and websites until we get it under control.

20th March 2006

AOL limiting senders on CROWN and DELUXE

Filed under: — Tracy @ 11:43 am

AOL is limiting the amount the amount of email it will allow from Crown and Deluxe at the moment. It is difficult to predict how much email will be delivered, and how long the limits will last, but experience says that they will block most email for 1-2 days. Both servers had incidents where contact forms were expolited and used to send spam to AOL last week. Even though we quickly saw and stopped the exploitations, it only takes a few pieces of spam to trigger the AOL limit. A typical contact form exploit sends out hundreds in a matter of minutes.

If you need to contact AOL users, we suggest contacting them by phone or from another email address. If you forward email to AOL from your site or yourcontact forms, we suggest diverting them to another address for now.

We continuously work with all of our clients to keep forms up-to-date and to combat new explaoitations. If you have concerns about your own forms, or the contact forms commonly installed with shopping carts and other software, please contact us for help. Users of OScommerce (and custom versions like CREloaded) should be especially wary right now.

7th March 2006

Blacklist filtering being tested on CHAMPION

Filed under: — Tracy @ 12:43 pm

Today at 12:30 PM we began testing of RBL (Realtime Black List) filtering on the Champion server. Users of Champion are aware that we have been laboring under increasing spam loads that sometimes threaten to take down the server, especially around lunchtime during the business week. It is our hope that the RBL filters will knock down some of that spam load. So far, it appears to be working!

If you feel that some of your desired email is being filtered, we DO have ways to make sure it is received by whitelisting domains. This will be useful if one of your associates’ websites is hosted on a server that is blacklisted (which may have nothing to do with your associate). And we CAN exclude your domain(s) from this filtering if you prefer to receive unfiltered email.

Do please let us know how this is working for you, postively or negatively, by opening a helpdesk ticket and telling us what you see happening. We are testing a number of domains, but our clients can provide us with a wider range of observations.

1st March 2006

Network Maintenance 6 AM March 2

Filed under: — Tracy @ 6:56 pm

There will be network maintenance that affects Champion, Deluxe, and Crown tonight. It is not expected to cause any downtime. Additional fiber connections will be introduced at the datacenter. (Which will address today’s network issue on Champion.) The maintenance window is 6:00AM EST - 6:15AM EST on March 2, 2006.

Network Issue Affected Champion

Filed under: — Tracy @ 6:12 pm

Late this afternoon we experienced network issues that affected email delivery on Champion. Many of you may have had trouble retrieving and sending email, or lags in delivery. We apologize for the lag in posting to the blog. Only one technician was covering the helpdesk, and he was busy working on the problem with the datacenter personnel. The difficulty does appear to have cleared now. Do open or update your helpdesk ticket if you are still having trouble.

Update: Network Maintenance to add more fiber connections is scheduled for early tomorrow morning.

24th February 2006

Crown is down

Filed under: — Tracy @ 7:21 pm

Crown suddenly went down a few minutes ago. We are investigating now. More shortly.

Update 7:41 PM - The problem is at the datacenter… the rack is down. They were immediately at work on it. We are wringing our hands and weeping supportively.

Update 8:18 PM - Good news! It should be coming back up in a few minutes.

Update 8: 24 PM - Yay! We are back up! I will get details about what happened shortly.

16th February 2006

Good news: email is better on Champion

Filed under: — Tracy @ 2:26 pm

As those of you on Champion know, we have had a really rough email week. Suddenly, email loads were through the roof during the business day, often resulting in people not being able to send or receive email for a couple of their busiest hours. Ugh.

But we are definately making headway. Today was better, and the mail server stayed up through the noon crush:

- There was a new Denial of Service scheme adding to the load, exploiting the cpanel configuration for recursive DNS lookups. We changed the configuration and stopped that effect.

- We have a handful of users with high email usage that cannot be tracked… nothing is wrong with their home/office computers or their websites, yet there is unaccounted-for email going in and out. We have moved a number of those users to a less populated server while we figure out what is going on. Each, by themselves, would not have a much effect on a server, but having a cluster on a busy server was contributing to the congestion.

- We fixed a possible conflict between cpanel and SpamAssassin that might have been making SpamAssassin work harder but filter less. Long-term, we have been looking into ways to replace SpamAssassin altogether. Our major concern is that we do not want to start blocking legitimate email (false positives). We hope to have a new spam/virus filtering plan in place shortly, that doesn’t add to the server load (as SpamAssassin and ClamAV do).

An analogy is traffic on the Schuylkill Expressway. Sometimes, traffic jams due to “volume", sometimes there is a big truck blocking two lanes, and sometimes there is a fender-bender with rubberneckers. In this case, I think we had some of each. We cleared the accidents, banned wide-loads, and found alternate routes for some of the volume. Today’s volume was high, but there were only minor delays on the Expressway at rush hour.

Monday 2/20 Update: Email traffic today was very smooth. I think the changes we made were successful, and there are still more to come. Those of you that have told us, “We noticed some connection problems for the past few weeks, but didn’t say anything,” you must tell us in the future! Some early signs of trouble are not visible to our monitors until they get bad. Our customers are our most sensitive montoring system.

13th February 2006

Email crippled by huge spam load on Champion

Filed under: — Tracy @ 8:07 pm

Tracy reports: I had to suddenly run a family member to the emergency room this afternoon, and left the helpdesk unstaffed. I apologize, but I had no choice.

While I was gone, we were hit on Champion by a huge spam load, enough to function like a Denial of Service attack, and the email server choked. Jeff is on now, and I am back, and we are unraveling the problem as quickly as we can. We will identify the largest sources and block them. Some of you will be contacted directly about managing spam better. Those of you with helpdesk tickets opened are also being contacted. I apologize profoundly for the problem… my cell phone was blocked inside the hospital and I didn’t know what was going on until it was big enough to take down the mail server. An unlucky coincidence.

Update: The server is up and running well… queued email is being delivered. We seriously appreciate that most of you used the helpdesk as a first line of inquiry. It really does help an awful lot.

Note: The family member is doing well. What looked like a serious medical emergency turned out to be a temporary neurological problem that will resolve itself in a few weeks.

11th January 2006

Champion was rate-limited by AOL today

Filed under: — Tracy @ 8:00 pm

Once again, the Champion server has been rate-limited by AOL. We do have a feedback loop and whitelist relationship with AOL, but our ability to send email to AOL is currently “throttled” so that only small amounts of email go through. AOL cannot tell me the specific problem that resulted in the current rate limit. They cannot even tell us what the specific problem is, or when they might be able to delimit us. The postmaster echnician I talked to could not answer my questions and eventually dumped me off with another technician that had nothing to do with the issue.

There are three general problems, we believe:

1) People who forward their email addresses to AOL and then report spam that has come to them through our server from other sources;
2) People who send out lists, even small ists, and have high bounce rates;
3) Exploited forms and other scripts that may send out spam until we can disable them.

Another problem may be autoresponders. If you have an autoresponder set up for an information address, and it gets spammed, you will be sending replies to the spam addreses, some of which may be at AOL. Unfortunately, AOL’s filters are not able to discern that AOL was the source of the original email.

I will be contacting some of you about changes we may have to make. Changes to autoresponders, to forwarder email accounts, and to list management. Over the past 2 months, we have already been contacting people that generate a lot of AOL spam reports, and that problem has been dramatically reduced. Before she dumped me, she “guessed” that it might be 72 hours before the limit was lifted.

Please remember: AOL is not a business service. It is a family entertainment platform. If you are using AOL for business, we will be happy to teach you to use your own domain’s POP3 accounts instead. You are already paying for that service.

To those of you having trouble emailing contacts with AOL addresses, we apologize, but AOL is very difficult to deal with.

Champion running a bit slow with an exploit

Filed under: — Tracy @ 2:37 pm

We are investigating the exploit of a PHP script on the Champion server. It is driving up the loads, so you may have some trouble connecting to the email and mysql servers. We hope to track it down and have it resolved quickly.

3rd January 2006

Email load is high on Champion this morning

Filed under: — Tracy @ 11:41 am

We are seeing a high email load on Champion this morning. The mail server is laboring, which might make it difficult to connect when picking up email. It may partially be the crush of people returning to work after the holiday, picking up and sending email in higher volumes than normal. But we are also looking for scripts or forms that may have been exploited to send spam.

12th December 2005

vBulletin updates 3.5.2 & 3.0.11

Filed under: — Tracy @ 2:02 pm

Updates for the 3.0 and 3.5 trees have been released. Bugfuxes, but also security patches for cross-site scripting vulnerabilities. Read more.

21st November 2005

Champion blacklisted by AOL

Filed under: — Tracy @ 12:02 am

Due to two exploited contact forms last week, the CHAMPION server is being mostly blocked by AOL. We disabled the form script, but not before it was used to send several thousand peices of email. We are working with AOL, but they are not particularly cooperative. We hope to have this resolved as quickly as possible.

Update Monday 11:45 AM: We heard from AOL, and the temporary restriction on this server will likely be lifted in the next 3 business days. That isn’t exactly instant. If anyone is experiencing significant difficulties due to this AOL situation, we can move you to a different server. Open a helpdesk ticket if you need a move. Be aware that moving is not a trivial process, and may cause other temporary email disruptions, so we suggest only requesting a move if you are impacted severely. (Most AOL users are accustomed to the fact the AOL regularly cuts off some or all of their email.)

Update: Tuesday 6:00 PM: We seem to be delisted. We still need to work on reducing the number of spam messages forwarded by folks that direct their unfiltered email to AOL. And we need to get AOL users to stop reporting those emails as spam, because AOL considers Gryphyn the source. We will be contacting some of you about that.

Again, we must emphasize that you are responsible for maintaining your website scripts. If a script is exploited, it has consequences for everyone on a shared server. If a script is exploited, we will immedaitely disable it. If we are not able to disable it easily, we will be forced to suspend your account. We will contact you as soon as possible if a suspension is necessary.

14th November 2005

Webmail down on Champion

Filed under: — Tracy @ 8:58 am

Neomail and Squirrelmail are not working right on Champion. Horde seems to mostly work, with some glitches. We are working on the problem now.

Update: Resolved. We moved the cache for a security process that was hogging all the space in /tmp.

9th November 2005

Three minute outage on CROWN server

Filed under: — Tracy @ 12:16 pm

We just had a 3-minute outage on the Crown server, from 12:46 to 12:49. A poorly-coded PHP script was exploited, driving the server load through the ceiling, and the server locked up. We rebooted and disabled the script. We will be closely auditing all of the PHP scripts on the server, in an effort to head off other problems. You may be contacted about scripts and databases on your account(s).

Exploitation of vulnerable out-dated scripts is THE biggest threat to the smooth operation of a shared server. Again, please do not upload a script you do not know how to secure or update. Do not have a developer install something you do not know how to administer if you stop using the developer’s services. We are forced to suspend service to accounts that cause these problems.

8th November 2005

Fantastico is messy today

Filed under: — Tracy @ 12:55 am

If you are trying to install or update a Fantastico script today, you will have trouble, on all servers. Fantastico issued an upgrade, and their servers are apparently so slammed that they cannot handle the upgrade traffic. Our upgrades did not complete properly. We will continue to try to load new master files tomorrow, and until we can get them all.

Already-installed scripts are not affected. They still work fine.

6th November 2005

Upgrades on Champion

Filed under: — Tracy @ 2:09 am

Early Sunday morning, we upgraded MySQL to 4.1, upgraded PHP to 4.4.0, and installed eAccelerator on Champion. PHP scripts will run faster.

1st November 2005

DOS attack on Champion

Filed under: — Tracy @ 7:02 pm

Someone has been hammering a handful of domains on Champion with email for the past half an hour. We have had to suspend two accounts temporarily, and are working to identify the IPs and block them. Loads will be high on the server until we get a handle on it.

Update: Loads are back to normal. We found that the incoming email was mostly bounces from 3 exploited mail forms. It looked like an incoming DOS attack until we looked at it. The owners of the exploited forms will be contacted abot their suspended accounts.

Folks, you cannot install programs on your sites and ignore them for years. You also should not install things to “test” them and then leave them to be exploited later. Uninstall unused scripts. We are forced to come down harder on users that endanger the smooth function of the servers, even when it is caused by well-meaning ignorance. We must suspend your account first, and then come back to you to figure out the problem after the server is restored to function. Our Acceptable Use Policy tells you this, and always has.

18th October 2005

Site hacked on Champion

Filed under: — Tracy @ 8:00 am

Loads have been high on Champion this morning, while we tracked down a hacked site. An abusive script had been installed. Loads are coming back down the victim is being contacted. We are investigating how script came to be hacked.

16th October 2005

Emergency maintenace on Deluxe

Filed under: — Tracy @ 11:18 pm

Deluxe has been having trouble. It was down off and on for 2 hours late Saturday night, then again for another half hour this morning. We believe it to be a hardware issue…. you may recall that the server is fairly new. At 2 AM Eastern today (Monday) we will swap out the hard-drive and mount a new box. There will be 20 -30 minutes downtime. We apologize for the inconvenience.

Update: This went well, and this server is now back up with drives in a different chassis.

7th October 2005

RAM upgrade on Champion today (went badly)

Filed under: — Tracy @ 5:20 pm

The Champion server has been getting a little slow lately. It hosts some sites that had been getting dramatically busier since September. We moved a few of those sites to other servers. But we are also upgrading the RAM on Champion, and everyone should see a nice uptick in performance. There will be no downtime associated with the upgrade.

UPDATE: We apologize. The “fastsync” that should have accomplished this with no downtime, is not “fast” enough. We are doing all we can to speed it up.

UPDATE: We are sorry for the rocky upgrade. It should not have taken as long as it did… we were eager to improve the server speed, but if we has expected any downtime, we would have waited until late tonight. IT WILL result in much-improved server speed.

1st October 2005

Routing Switch Replaced

Filed under: — Tracy @ 9:48 am

Last night, we replaced a routing swtich that had resulted in an httpd delay this week. When traffic peaked on one network, it was causing delays in the shoulda-been-instant re-routing to another network. That will no longer cause us problems. There was no downtime associated with the switch replacement.

28th September 2005

2 Minute Network Problem Today

Filed under: — Tracy @ 9:48 pm

We had a 2-minute traffic delay today at about 3:30 PM Eastern on Champion, Deluxe, and Crown. A portion of the network became overloaded, and it took 30-60 seconds for traffic routed across this to re-route. Two minutes is a long time, in this instance. We do not believe any data was lost, but connections dropped and sites were unreachable during the interval. These re-routings are routine and normally un-noticable. The datacenter engineers are lookig into why this particular event resulted in a problem.

23rd September 2005

22 Min Outage on 3 Servers

Filed under: — Tracy @ 10:51 am

As many of you know, we just had a 22 minute outage. It affected 3 of the servers that house mostly shared and nameserver accounts: Champion, Crown, and Deluxe (formerly Sterling).

It affected the entire Virginia datacenter where those servers are housed… and we still don’t know what it was. The datacenter phones and email are flooded with people just like us, looking for info. Their own info site/status site is still down. But our servers are back up. As soon as we have more info we will post it. It takes a very unusual circumstance to take down a whole datacenter.

Update: It was a core switch. They had it replaced in minutes, but it took time for everything to come back up.

We know that this comes hard on the heels of the Champion hardware outage on the 20th. Hardware-related outages are impossible to predict, but two outages in the middle of the business day within 3 days… not good at all. We are looking into how we can offer a premium level of redundant hosting for business-critical users.

20th September 2005

Champion is down

Filed under: — Tracy @ 11:47 am

The CHAMPION server went down at 12:21 PM Eastern … the datacenter engineers are work on it now. We will have news shortly.

Update 12:52 PM - The problem is not in the box itself… may be electrical or hardware. More as soon as I hear it. Affects more than just Gryphyn Media, and all hands are working on it.

Update 1:07 PM - Server is coming back up now. It will take a few minutes to restore all the accounts. It appears to have been a power problem that took out a rack. All the servers have been rebooted and are on the way back up. We will get more details after we get things working again.

Update 1:50 PM - We are back up. I am checking invididual accounts. The repair and hard reboot took an excruciatingly long amount of time. Some email would have queued, but some incoming email may have bounced. No data was lost. We apologize for the trouble… hardware problems can’t really be anticipated. But replacement parts were immediately available.

When there is trouble like this, it is best to use a combination of this blog and a helpdesk ticket to stay informed. The phone is not likely to be answered quickly when “all hands” are manning the helpdesk and solving the problem.

9th September 2005

Sterling server being replaced

Filed under: — Tracy @ 2:46 pm

We have been having trouble with the Sterling server for months. The developers of cPanel issued a number of automated updates that were buggy, and a series of fixes seems to have left us with lingering glitches and annoyances.

We want to wipe it and start fresh, but without causing downtime. So, we will be moving everyone from Sterling to other servers. Many of you have been contacted individually, and we will contact you again before we move you. No one should experience any significant interruption of service, but we will talk to each of you, to ensure a minimum of disruption.

Maintenance on GryphynMedia.com this weekend

Filed under: — Tracy @ 2:12 pm

We will be moving Gryphynmedia.com and it’s affiliated domains to a new server this weekend. We like to keep our helpdesk and other support functions in a different geographic location than our clients, so we will be in the Equinix datacenter in Ashton, VA. You should not see any interruption in helpdesk function. In the next few weeks, you will see a new look and some new features on the GM website.

17th August 2005

STERLING is down

Filed under: — Tracy @ 10:24 am

During a routine update of mySQL, something went wrong… we are working on it now.

Update 10:29 AM: Just a hiccup while Apache was recompiling. Back to normal now. I figure it’s better to post news about these things immediately, than wait to see how it turns out and have you folks open a lot of helpdesk tickets. :-)

7th August 2005

Zend Optimizer updating on Crown

Filed under: — Tracy @ 1:22 am

Zend Optimizer is being updated on the CROWN server, to fix a PHP issue. We will have to reboot, and there will be an outage of a few minutes duration.

25th July 2005

Emergency Cpanel downgrade on CROWN

Filed under: — Tracy @ 10:28 am

We have been having intermittant email trouble on CROWN since the last cPanel automated upgrade. Some users email is down right now. The engineers at cPanel have not been able to identify the bug, so we are doing an emergency downgrade to the previous version of cPanel, which should correct the problem. There will be about half an hour of slow connection and email hiccups while we perform the downgrade. We apologize for the inconvenience.

14th July 2005

Crown is being rebooted

Filed under: — Tracy @ 10:54 pm

The CROWN server froze up at about 11:20 PM and was rebooted, but went right back down. We are investigating and will report back as soon as we have news.

13th July 2005

Httpd is down on CROWN

Filed under: — Tracy @ 11:50 am

For as-yet-unknown reasons, httpd (the web server) is down on the CROWN server. Mail and other services are up and we are actively working on the problem, which started at about 12:23 PM . Back with an update ASAP.

Update: 12:55 PM Httpd is back up. Earlier this morning we installed a PHP accelerator to improve the performance of PHP scripts. But it took down PHP and then Apache/httpd. We will wait until the wee hours of the night to fix it, so we don’t risk anymore East Coast business-day interruptions. We profusely apologize. We expected the change to be a good thing, and completely invisible.

29th June 2005

Several kinds of trouble last night

Filed under: — Tracy @ 9:39 am

Last night, there was a very brief power outage during a routine power transfer in the datacenter. It caused all the servers to reboot… whihc would normally only result in couple minutes of downtime in the middle of the night. But this time, it happened in the middle of my nightly cpanel update, so when the servers came back up, cpanel and exim (the mail server) were not working properly and both had to be manually restored on all the servers. All websites were up, but email and cpanel were down.

Meanwhile, the email queue was building up… email was being received, but not distributed to individual inboxes. When the mail server came back up, it was immediately under a huge load, which made the servers run slowly until the queues emptied out.

A really long night… but things are back to normal now. If you are still having problems of any sort with your account, please open a helpdesk ticket.

21st June 2005

Script Hacked on Champion

Filed under: — Tracy @ 7:56 pm

MySQL has been making the Champion server very sluggish tonight… we finally pinned it down. An older version of phpBB was hacked on one domain. We have shut down that script until the owner can update it. Things will return to normal quickly, now. We will also be looking for other old version of phpBB and notifying owners that they need to update to remove vulnerabilities.

7th June 2005

Champion server is being hammered with spam

Filed under: — Tracy @ 12:09 pm

The Champion server is being inundated with spam and virus email right now… we have several technicians working on it. Cppop, imap, and spamd went down… we have restored those services, but Exim is still laboring under an immense burden. We will find and block the source(s), but email will be moving slowly until we do. More news as soon as we have it.

24th May 2005

ALERT: Memorial Day Weekend Server Maintenance

Filed under: — Tracy @ 5:55 pm

Over the Memorial Day weekend, when US web traffic will be lowered, we will be doing a lot of server maintenance. Several older servers will be upgraded.

Those of you on Octopus will welcome this upgrade; it will cure the intermittant email trouble we have had over the past month. No one who currently has a Nameserver Account or a VPS account will be affected.

We expect the process to be invisible to most clients…. only momentary downtime, if any. Occassional periods of slow response time when things are being reloaded and restarted. All of that should occur in the wee hours of the night for most of you in the Eastern time zone.

It will matter little to most, but the names of some servers will change. Some of you will get new IPs for your hosting accounts. Each of you will be notified of changes that affect you. If you use your domain name (rather than your IP) to log in to FTP and the control panel, you may never notice the changes. The versions of PHP and other software will remain the same, to minimize transitional issues.

If you have questions or concerns, do feel free to open a helpdesk ticket to ask. We STRONGLY recommend using the helpdesk, rather than phone or direct email. Tracy, Roberta, and other administrators will be on duty this weekend, and we will want to keep our ducks in a row, support-wise. All of us can see helpdesk tickets. Even if it is an emergency, use the helpdesk FIRST, after checking here at the Service Blog to see if your concern is being addressed.

Have a safe holiday weekend!

10th May 2005

Octopus Update: PERL being reinstalled

Filed under: — Tracy @ 12:00 pm

Perl was affected, and we are reinstalling it, which should cure the problem. CGI scripts will be down just a little while longer… we apologize for the long recovery period.

Cracked website on Octopus caused morning server failure

Filed under: — Tracy @ 9:41 am

Lots of services went down on the Ocotpus server this morning (nameserveers ns5/ns6.gryphynmedia.com).

A website on that server was cracked. The client changed their password to a short dictionary word because it would be easier to remember. Unfortunately, it was also easy to guess. A spam engine was installed on the server, which spewed out spam fast enough to take down the server, clog the queue with bounces and admin messages, and generally make a heck of a mess.

We are cleaning it up and restoring services as fast as we can. We temporarily took down PHP and cgi services until we could identitfy the problem. Your websites did not actually go down, but email and many other services did.

Please, recall that on a shared server security is everyone’s business. Use mixed letters and numbers for your passwords. There are spambots scouring the internet every second, looking for easy-to-exploit web accounts with weak passwords. Do not use a dictionary word… there are cracking tools that try common dictionary words until they guess which one you used. If you are not sure how to reset your password, open a helpdesk ticket and we will do it for you.

18th April 2005

Octopus being rebooted

Filed under: — Tracy @ 11:24 am

Octopus crashed under the load. We suspended the high-use account and rebooted to bring it back up. We will be clearing the queue and things should come back up normally in just a few minutes.

Update: We actually found three email-related problems on the server. A user with webmail files so large they choked the server every time they were accessed, an un-batched email list with 3000 addresses, and a use being emailed by so many spammers that the bounces were slowing the mail server. We have addressed each of these issues now, and Octopus is looking good again.

Email trouble on Octopus

Filed under: — Tracy @ 10:25 am

We have been having email trouble on the Octopus server for the past few days. Connecting to POP mail and webmail has been intermittantly difficult. This affects people using nameservers ns3/ns4.gryphynmedia.com.

We have found the problem. One of the accounts on the server is getting spurts of high volume of email, which is overtaxing the mail queue and slowing Exim to a crawl. We are moving the account to a higher-volume server as quickly as we can, which will resolve the problem permanently. We appreciate your patience. If you have a site that becomes very successful, we will also help you to scale up.

26th March 2005

Update on CROWN outage

Filed under: — Tracy @ 3:06 am

The new CROWN server has been fully live for hours and all of the nameserver and DNS changes were made Friday evening. Many domains are resolving… it appears to hinge upon which registrar the domain uses. Some are still pointing at the old server IP, and others switched almost immediately. We continue to watch for problems. A very few accounts did not restore fully, and we will fix that when the old server come back up in the morning.

This was been a strange and grueling episode in our hosting experience. We apologize for the inconvenience.

25th March 2005

3 Servers down in Atlanta - serious problem

Filed under: — Tracy @ 9:22 pm

The three servers in Atlanta went down at 8:15 PM Eastern. There is a serious problem there, having to do with an upstream provider being sold and someone voiding someone’s contract. Thousands of servers just went down. This affects about 10% of Gryphyn Media clients, and I have contacted some of you personally, already. CROWN, CHAMPION, and STERLING went down.

We are already loading the backups onto new servers. I am not waiting to see what happens, and I am not putting up with someone playing lawyer games with my servers. I cannot apologize enough for this mess. I found out just hours ago that this might happen… or I would have made a move sooner. I knew about the sale of the provider, but I was told it would not affect us in terms of downtime.

Silver Lining (if you can call it that): This is Friday night on a holiday weekend, and one of the lowest traffic points we can get to have an outage. Also, the new datacenter in Virginia is really cool… fortunately, I already had an account established there.

I don’t want to minimize the potential for downtime. We are moving as fast as we can. CHAMPION and STERLING are already completely back up. IP changes are propagating *very* quickly. But some sites and some ISPs will not reflect the changes until they refresh their DNS caches… could be up to 4 hours. CROWN has more nameserver accounts on it, and thus more DNS changes.

DO open a helpdesk ticket if you need more details… I will be emailing affected customers individually with new IP information and other details. If I find I do not have an up-to-date emergency contact for you, I will attempt to telephone you.

23rd March 2005

Octopus Server down

Filed under: — Tracy @ 10:15 am

The server called Octopus just froze and is being rebooted…. we are investigating. It is likely to be a runaway script. This will affect domains with nameservers ns3/ns4.gryphynmedia.com. More news shortly.

10th March 2005

Catfish again

Filed under: — Tracy @ 9:26 am

We thought we found the script problem yesterday evening, but something else went awry this morning at 7:30 AM Eastern.

In case you wonder why we can’t instantly pinpoint a problem, basically… when we reboot the crashed server, we have to watch for the problem and try to nail it before it crashes again. If it happens fast, that can be difficult… we need to find the script, and see who owns it to disable it. If it happens as the result of an intermittant script that is triggered by unknown events… we can sit and watch the darn thing for hours, and then have it crash the minute we turn away for coffee.

Please, please do not use a shared server to test your brand new PHP scripts. A shared server is NOT a development server… hundreds of users are affected if you have made a mistake. If we find a developer client doing this, we would have to consider “firing” you as a client.

9th March 2005

Server Catfish crashed at 3 PM Eastern

Filed under: — Tracy @ 3:25 pm

The Catfish server (ns5/ns6.gryphynmedia.com nameservers) crash at about 3 PM Eastern, after a brief period of high loads. Restarted and crashed again… very likely a runaway script… we are working to identify it and get the server back up quickly.

Update: The server was restored after about 20 minutes, but went down again at 5 PM… obviously, we didn’t find all of the trouble. This is a different problem. We apologize for the inconvenience and will get you back up as fast as possible.

15th February 2005

PHP down on Catfish

Filed under: — Tracy @ 4:38 pm

Sites running php pages on the Catfish server are showing “500 Internal Server Error” right now. Something just went wrong with suPHP…. we are working on it. Apologies for the inconvenience.

Update: Had it back up and now it is down again. I know it is very frustrating for everyone… we are working as fast as we can. We can’t just disable it without breaking everyone’s php scripts in a whole new way.

Update: I think we have it sorted now. If anyone is still experiencing problems with PHP sites, open a helpdesk ticket, please.

25th January 2005

Intermittant 3 AM Eastern lags today

Filed under: — Tracy @ 8:39 am

Between 3 and 4 AM Eastern, you may have noticed some intermittant lags and brief outages on the Sterling, Crown, and Champion servers. The atlanta datacenter was running routing failover tests… without notifying me first. Maintenance is necessary, but should always be accompanied by prior notice… and the responsible parties have been sent to bed with no supper after a good spanking.

We apologize for any inconvenience. That’s normally a very slow time for most East Coast clients, so we hope no one was seriously discommoded.

23rd January 2005

Catfish server rebooting

Filed under: — Tracy @ 9:10 pm

We are rebooting the catfish server (the one with nameservers ns5/ns6.gryphynmedia.com). All the services crashed at 10 PM EST. We will do some diagnostics when it comes back up.

28th December 2004

Mod_Security Installed

Filed under: — Tracy @ 8:56 pm

In the wake of all the problems with the phpBB exploit, I have improved our security by installing Mod_Security on the shared servers. You can read more about it at http://www.modsecurity.org/.

21st December 2004

Security Update for PHP

Filed under: — Tracy @ 2:58 pm

All servers are currently being updated to php 4.3.10 not only because it’s festive, but due to serious security issues with versions prior to 4.3.10. This may cause some HTTP downtime while your server is being updated, but that shouldn’t be longer than 5-10 minutes, if at all. Please note: this *can* effect your php scripts, especially if they are using ZendOptimizer, since ZendOpt is also being updated.

29th November 2004

Scheduled maintenance on Crown, Champion, and Sterling

Filed under: — Tracy @ 11:39 pm

The Atlanta datacenter will be performing maintenance on Tuesday November 30th, 2004 from 2 am - 5 am EST. Engineering will be performing an IOS code upgrade on 2 of our Core Switches. During this maintenance window you may experience brief periods of service disruption. Engineers will work to minimize the disruption by upgrading and testing one switch at a time, but it is almost certain that there will be at least some interruption. We apologize for any inconvenience caused by this necessary preventative maintenance.

We scheduled it at a time when traffic is typically lower. Only the servers Crown, Champion, and Sterling are affected… nameservers ns7-ns8, ns9-ns10, and ns11-ns12.gryphynmedia.com. If you have a nameserver account, you may affect you.

15th November 2004

Crown server load is very high

Filed under: — Tracy @ 8:30 pm

The server is running very high on the Crown server right now… it may time out for some POP and HTTP requests. We are looking into it… this is often a runaway script, a big email list trying to go out without being chopped into batches, or a little DDoS attack. More news as soon as we have it.

Update: It was a blog script being attacked by a comment spammer. We have blocker the user, which seems to have resolved the problem. We recommend using blog scripts that allow you to turn off commenting, if you don’t need to actively use it. WordPress, for instance.

1st October 2004

Atlanta servers back up

Filed under: — Tracy @ 11:39 am

By the time I got done posting about it and emailing all the emergency email lists… it was back up. An explanation should be forthcoming.

Update: There was a problem with a group of routers. I am not entirely happy with the tech support explanation of what happened and why it took so long to fix. I will continue to investigate.

Further Update: Here is a more complete explanation. It was basically a hardware failure:

“The problem was with our routers. It looks like the memory stack in the main router became corrupted which caused problems. We had to reboot the router because of that and is also the reason the fail-over to the backup router did not complete. There was some trouble rebooting the router which we fixed by changing the flash card that the router’s operating system run off of. This fixed the boot problems and indicates that the original flash card was the source of the memory corruption as well.”

High-volume server is unreachable

Filed under: — Tracy @ 11:20 am

At about 11 AM, the Atlanta servers became unavailable. The whole datacenter is unavailable. My contact says there is a major network problem and all hands are hard at work on it. I will provide updates and explanations as I get them.

This affects the Sterling, Champion, and CodeGal servers.

5th September 2004

Scheduled Maintenance on high-volume server

Filed under: — Tracy @ 1:38 pm

There will be scheduled network maintenance to occur this Thursday morning, September 9th, between 2:00am and 4:00am EST. The network should not be down for more than 30 minutes, but there is always a possibility of the maintenance to take longer. The datacenter will be doing firmware upgrades to the router, and some power supply maintenance. We apologize for any inconvenience, and will try to keep the interruption to an absolute minimum.

12th August 2004

Spammer trouble this morning

Filed under: — Tracy @ 10:42 am

Early this morning, someone tried to send out 200,000 emails on the Catfish server (the one with nameservers ns5/ns6). That clogged up the mail queue and drove the server load through the roof. You were not able to get your email, and you may have have trouble getting websites to appear. We have now mopped up the mess and summarily executed the spammer.

Reminder: “Spam first and apologize later” is NOT a good business policy.

11th August 2004

Network Trouble in Atlanta

Filed under: — Tracy @ 2:24 am

There is network trouble at our Atlanta datacenter, which is affecting connectivity for clients with nameservers ns7/ns8.gryphynmedia.com and ns1/ns2.codegal.com. We expect it to be resolved, and an explanation produced, shortly. This is very rare at this datacenter. We apologize for any inconvenience.

15th July 2004

Security Upgrade of PHP

Filed under: — Tracy @ 12:32 am

We will upgrade all the servers to the latest 4.x Stable version of php - 4.3.8 (security
release)over the 24 hours on all servers. There is a flaw in 4.3.7 that could compromise data. Further details regarding the 4.3.8 release:
http://www.php.net/ChangeLog-4.php#4.3.8

You will experience no downtime. Perhaps a momentary hiccup in the PHP.

2nd July 2004

Spammer caused us trouble today

Filed under: — Tracy @ 12:45 pm

Starting sometime last night, we were seeing problems with MySQL and thought we had trouble with the database server. But it was actually a clever spammer exploiting someone’s script to send out bulk email. Some spammer discovered that an old, but less-used, marketing program was exploitable, and built a spider to search the web for installations. And it found one on the Octopus server.

The script in high volume use caused high loads on the server, and we thought it was a MySQL problem, but them Exim (the mail server) also went down, which is typical of a spam problem; multiple services fail. Then we just had to identify the source and stop it.

The Gryphyn Media client was more victim than villian here. They had purchased a script from an established vendor and run if for years without a problem… until last night. Unfortunately, their trouble also affected the rest of the hosting clients on Octopus.

Everything is now stable and quiet. If there is a lesson here, it would be to take a look at scripts you have been using for a long time… can they be used to send email? Then you might want to have someone examine them for exploits, especially Perl scripts. Particularly if you are a developer… might you have installed a program that will cause grief for a client?

28th June 2004

Statistics Reporting Problems

Filed under: — Tracy @ 11:55 pm

On the Octopus server (ns3/ns4.gryphynmedia.com nameservers), we have been experiencing an issue with stat reporting. It was reporting errors when trying to access AwStats. The problem has been fixed and the error will disappear during the next stats processing cycle.
If you are still experiencing errors after 48 hours from this post, please contact me: support@gryphynmedia.com or through the helpdesk.

17th June 2004

Kernel Upgrade on High Volume Server

Filed under: — Tracy @ 12:25 am

I announced a kernel upgade for clients with nameservers ns6/ns7.gryphynmedia.com, tonight:

> We will need to do a bit of server maintenance tonight. Sorry for the
> short notice, but it is a security patch. We have been updating the Linux
> kernel on all the servers, and it’s time to do the server you are on.
> Around midnight Eastern time, you might eperience a few minutes downtime
> when we reboot… it could be a slightly longer if we have to do a manual
> File System Check. Aside from the brief outage, you should notice no
> other change.

Update: The kernel upgrade is complete. While I was in there, I also upgraded Apache and Cpanel, rather than annoying you with further service interruptions later.

15th June 2004

Kernel Upgrade

Filed under: — Tracy @ 10:58 pm

Early this morning, I upgraded the kernel on most of the servers. There was a security flaw that could be exploited by a local user to instantly crash machines. The servers were rebooted. Downtime was less than 5 minutes. I did not announce it in advance, because you just don’t say, “Hey, the door is still open until I fix this thing.” Article about the flaw:
http://linuxreviews.org/news/2004-06-11_kernel_crash/

MySQL problem forced a reboot

Filed under: — Tracy @ 12:51 am

Late this evening, a runaway script slowed the Octopus server (nameservers ns3 and ns4). I found and disabled the script, and rebooted the server, which seems to have cured the problem. Sorry for the inconvenience.

31st May 2004

Network Maintnenace Sunday 6th 4 MA GMT

Filed under: — Tracy @ 2:27 pm

Network Maintenance Sunday 6th June 04:00AM UTC. Greenwich Mean Time (GMT)is now known as UTC, the country-neutral designation chosen by the European Union. Since we are on Daylight Savings Time, that will be midnight on the East Coast.

We will be reconfiguring a Cisco switch, which will require a reboot. There will be a couple of minutes downtime while the switch boots back up and tests all the ports.

20th May 2004

NAC Outage Explanation

Filed under: — Tracy @ 4:19 pm

I have the NAC explanation for this afternoon’s outage:

Thursday, May 20, 2004 at 2:02pm EDT Net Access Corporation began to experience network difficulties in Parsippany, NJ (node code: OCT). This outage affected dedicated access T1, T3, and Ethernet customers aggregated on gbr1.oct.nac.net.

By default we place a standard firewall filter on every interface. These filters help alleviate certain types of Denial of Service (DoS) attacks. In addition to these proactive filters, customers who purchase tiered bandwidth services from NAC also have a “policer” applied to their interface to rate-shape their bandwidth.

In cooperation with an engineer at Juniper’s TAC, we discovered a software bug in the version of JunOS that we are running on gbr1.oct.nac.net. In this version of JunOS when you have a firewall filter and a policer applied to an interface it inadvertently applies the policer to all interfaces that have the same firewall filter. This was the unfortunate culprit of today’s outage.

We sincerely apologize for any problems this outage caused you.


And they have announced a maintenance window:

Sunday, May 23rd, 2004 Net Access Corporation will be performing emergency service affecting maintenance in Parsippany, NJ (node code: OCT). The maintenance window will commence at 7.00am UTC, 2.00am EDT.

We have identified a serious software bug in the version of JunOS running on our gbr1.oct.nac.net router. During this maintenance window we will be upgrading the JunOS. Since we will be going from a 5.x to a 6.x release, the router will require two (2) reloads. We expect the total downtime to be less than 15 minutes. Dedicated access T1, T3, and Ethernet customers aggregated off gbr1.oct.nac.net will be affected by this emergency maintenance.

Network Outage

Filed under: — Tracy @ 2:11 pm

There is a network problem at NAC, the datacenter in NJ… connections may be very slow, or may time out. We are in contact with the datacenter to get a status report.

Update: Routing has been restored… I am still waiting for the explanation. The servers were not “down” at all, but no one could get to them.

13th May 2004

Catfish Server crashed

Filed under: — Tracy @ 10:31 am

If your domain’s nameservers are ns5/ns6.gryphynmedia.com, then you may have just experienced a 15 minute server outage. A new client upload a site with an untested script that went bananas, and crashed the server three times in 10 minutes. We have it under control now. The client apologizes, as do we.

Note: It is a good idea to test scripts in another environment, if you are a DIY programmer.

11th May 2004

Exim Security Update

Filed under: — Tracy @ 4:20 pm

Due to a critical vulnerability just found in Exim, I upgraded Exim on all servers. It is unlikely to have been noticeable.

8th May 2004

Emergency Maintenance

Filed under: — Tracy @ 11:23 pm

One of our upstream providers will be performing emergency maintenance on one of our peering edge routers. During this maintenance, it is likely that you will experience a brief (2 -3 minute) disruption in Internet traffic. We apologize for any inconvenience, but it was unavoidable.

4th May 2004

Network Lags Caused by Sasser Worm

Filed under: — Tracy @ 9:37 pm

The four variants of the Sasser worm that have appeared since this weekend are slowing various parts of the some networks. Our own servers are NOT affected… they do not run Windows. But some hosting clients are reporting difficult connections, and slow email delivery. This is not a result of any problem with the servers or the datacenter, it is in the external networks.

One of many articles about Sasser: http://www.eweek.com/article2/0,1759,1584121,00.asp

15th April 2004

Network Trouble

Filed under: — Tracy @ 10:50 pm

I have been seeing network trouble tonight, affecting both the NJ datacenter and, to a lesser extent, the Atlanta datacenter. It is currently difficult to access POP and FTP, and HTTP is moving sluggish. We are filtering a Denial of Service attack at NAC, but there may also be something more general happening in the network. We are monitoring the situation. Our apologies for any inconvenience.

Update: Things are moving more normally now. We will continue to watch.

30th March 2004

Email Glitch

Filed under: — Tracy @ 12:22 pm

We are experienceing an email glitch on one of the servers… Octopus, which affects domains using nameservers ns3/ns4. It seems to be a filtering issue and we are working on it right now. I will have an update shortly.

Update: The problem is resolved. A file in the SMTP system became corrupt and made a lot of incoming mail bounce with “Unknown User” errors. It would only have affected email between about 11:45 and 12:30 AM today. I apologize for the inconvenience. I know how unsuitable it is to have your business email bounce.

29th March 2004

Cpanel Update

Filed under: — Tracy @ 1:27 pm

I will be updating the control panel (Cpanel) software on various servers between Midnight and 2 AM Eastern on Wednesday, March 31. You might see a brief outage in your ability to pick up POP3 email… but nothing will bounce and it will be right back up after I restart Cpanel.

Update: This went smoothly, with very minimal interruption of service.

23rd March 2004

Hardware, again

Filed under: — Tracy @ 5:18 pm

We had another brief network issue today. A network cable went bad, apparently related to yesterday’s switch upgrade. It has been replaced. The server did not actually go down, and there was no data loss. But one server was briefly unreachable. We apologize for the interruption. Things should be back to “rock solid” now.

22nd March 2004

Hardware Trouble Today

Filed under: — Tracy @ 1:06 pm

At about noon today we started experiencing network issues that made connections very slow. We found a problem with a switch, replaced it, and now things are back to normal. The servers themselves did not go down, and there was no data loss. The affected servers were in the NNJ datacenter, which includes most of GM’s shared hosting clients. We apologize for the inconvenience.

12th March 2004

“Emergency” Cpanel Upgrade

Filed under: — Tracy @ 4:33 pm

I was planning to upgrade Cpanel to the next stable release next week. But there is a flaw that needs patching more immediately. So I am doing it right now. Friday nights tend to be slow, support-wise, so I don’t expect to interrupt many of you. There will be little or no downtime. Maybe slow POP connections for a little while.

UPDATE: ALL done! Cpanel and MySQL are all freshened up.

Scheduled maintenance

Filed under: — Tracy @ 1:04 pm

On Sunday, March 14, 2004, at 2 AM EST, we will have a brief outage for scheduled maintenance. This will affect shared hosting clients with nameservers ns3/ns4 and ns5/ns6. The datacenter will be installing new stuff to faciliate an upcoming OC-12 to UUNET/MCI. The down time is expected to last only 5 minutes, but could result in slow connections for up to one hour.

1st March 2004

Server Loads High Today

Filed under: — Tracy @ 4:25 pm

We had had trouble with a runaway script on an account today. It made the load run high on one of the shared servers, and it may have taken a long time to connect for email and websites to load. We believe it is under control now, but are watching closely.

8th January 2004

Scheduled maintenance

Filed under: — Tracy @ 4:05 pm

Next week, between Monday and Friday, the NAC datacenter will be migrating us to a higher level of connectivity. The work will be done between 2AM and 4 AM EST… but they cannot predict which day. It is unlikely that anyone will notice the interruption.

The official jargon: “During this network maintenance Net Access Corporation will be migrating all layer 3 customer VLANs from msfc1.oct.nac.net to gbr[1,2].oct.nac.net. The purpose of this maintenance is to provide customers with the ability to have redundant layer 3 connectivity via VRRP (Virtual Router Redundancy Protocol.) We do not anticipate on more than one minute of possible downtime per customer.”

7th January 2004

Linux Upgrade

Filed under: — Tracy @ 3:40 am

All the servers have been updated to new 2.4.24 to close another recently-announced security hole found in the 2.4.23 release. They were rebooted… but it was unlikely any of you noticed the very brief outage

30th November 2003

Webmail/IMAP issues

Filed under: — Tracy @ 10:44 pm

I am having trouble with IMAP. It may be related to the Apache upgrade I just made. I am working on it diligently with the Cpanel folks. In the meantime, please use POP3 service to download email. Or, if you use webmail extensively, you will find that Horde and Squirrel Mail are not working properly. Please switch to using NeoMail until I get this straightened out. I apologize for the inconvenience.

29th November 2003

Apache and PHP upgrades

Filed under: — Tracy @ 2:49 pm

Last night I did upgrades to Apache 1.3.29 and PHP 4.3.4, the latest stable releases. You should see no operational difference.

26th November 2003

Having trouble reaching the UK?

Filed under: — Tracy @ 3:44 pm

You may have trouble calling and emailing clients in the UK and Europe right now. A major failure in the TAT-14 fiber-optic cable system that connects the United States and Europe appears to have caused widespread disruption to Internet services in the United Kingdom. “France Telecom will send a cable ship out to fix and repair the problem.” said a spokesman for the telephone consortium that owns the system. It is not known why or how long it will be down.
http://zdnet.com.com/2100-1103_2-5111964.html
(I’m thinking it’s a giant sea monster.)

24th November 2003

Octopus back up

Filed under: — Tracy @ 1:20 pm

It was down for about 3 minutes. Everything is fine now. Has to be on a Monday, the busiest support day, doesn’t it?

Octopus server being rebooted

Filed under: — Tracy @ 12:26 pm

Apache crashed and I am rebooting Octopus now. I will report back in a few minutes on the problem.

10th October 2003

Control Panel Upgrade

Filed under: — Tracy @ 1:57 pm

I’ll be upgrading Cpanel late Sunday night. Minor bug fixes, so you should see no difference at all and there is no downtime expected. Cpanel is our 3rd party hosting control panel package, one of several popular control panel systems used by share hosts. If you are interested in documentation, you have too much time on your hands. No, seriously, here is the link: http://www.cpanel.net/docs/cp/

1st September 2003

Brief Outage

Filed under: — Tracy @ 9:36 am

We had a brief outage last night. Octopus crashed twice and it took some poking to figure out why. A client was experimenting with live audio streaming using an open source application. I started to run too many processes, and the server load went very high and then froze.

Please, talk to me before installing something like that… I can find out what problems it might cause on various platforms. Some things are not suitable for a shared server environment… or the “experimental” part of the process needs to happen in a development environment. Talk to me about that, too… I can help.

14th August 2003

Eastern Power Outage and Service Notes

Filed under: — Tracy @ 5:01 pm

If you are watching the news, you will see that there is a massive power outage affecting Eastern US cities from Detroit to NYC since about 4:15 PM Eastern. Many of you are hosted on servers in Northern New Jersey, which is in the failing power grid area. But the datacenter has several sets of emergency generators and your websites will NOT go down. (Although YOU may lose power if you are in the power outage area.) There should be no effect at all for those of you in Atlanta and Houston.

I repeat: the servers are NOT down.

You may have trouble seeing your site… there are lots of links in the network between your physical location and the NJ servers… some may be affected. Some some of you may have ISPs (access providers) with pieces of network that are affected. I know that Verizon has experienced minor issues today.

Other Service Notes:

I did a control panel upgrade last night… and it included a “fix” I was not aware of, that has caused trouble for some of you.

You must now use SMTP Authentication for outgoing email if you are using a POP3 account ot send email throught your ISP. That includes MANY of you… it does not affect webmail users. But if you are sending email from your home/office email program If you are trying to send emali and you are getting a “550 Error” from your email program, you will need to enable SMTP Authentication.

Enable SMTP Authentication (called “Outgoing Server Authentication” on Outlook) and enter your ISP’s username and password (NOT your hosting user/password). The procedure varies for different email programs. Check the help documents for your email program (Outlook, Eudora, etc)… this is a common procedure. Many of you are already doing this because your ISP requires it. Contact me if you need help.

This procedure does help reduce the overall incidence of spamming on the Internet, so it is a positive move, rather than just an annoyance. Once you have it set up, you should not have to mess with it again.

http://gryphynmedia.com/helpdesk/

7th August 2003

Mailserver Upgrade

Filed under: — Tracy @ 3:01 pm

I spoke too soon. The mailserver continued to be cranky. I just made some software upgrades that required me to restart mySQL and Apache… so you may have noticed a brief outage. I’m sorry for the lack of notice, but it needed doing right away to keep the mail flowing. NOW we should be fine.

Incidentally… all of today’s shenanigans only affected the folks on Octopus… that would be nameservers ns3 and ns4.gryphynmedia.com.

Octopus Server Rebooted

Filed under: — Tracy @ 10:41 am

The server was running high loads this morning and eventually had to go for a reboot. A script had been configured badly on someone’s website and was running a bazillion processes and spewing admin email. All fixed… should be smooth sailing now. No email or data was lost.

22nd July 2003

MailServer Crash

Filed under: — Tracy @ 9:29 pm

You may or may not have noticed that there was a brief period this evening when you could not get email. The mailserver (Exim) crashed and had to be rebooted.

16th July 2003

Security Upgrade Today!

Filed under: — Tracy @ 1:54 am

On Thursday, July 17th, our datacenter will be performing emergency maintenance on several Cisco based routers, switches, dial-platforms, and ATM switches. They will be upgrading the IOS devices due to a recently discovered security issue. It’s a serious issue, which is why there is so little warning. Cisco will be publicly announcing the details of the issue tomorrow night, AFTER we upgrade.

28th May 2003

Datacenter Back Up

Filed under: — Tracy @ 12:20 pm

The datacenter has been allowed to power back up, and our servers are back in business, with no data loss. Whew! Every web host’s (and website owner’s) nightmare.

It was apparently a very small fire, and the fire suppression system worked perfectly to put out the fire without damaging anything else (I believe it involved something dramatic, like sucking the air out of the room). But it is standard procedure for the fire department to shut down power to any building with a reported fire. Even if it supports thousands of servers. The building was evacuated, and no one was injured. Power has now been restored and everyone should be back up.

Their emergency procedure worked… my new server emergency network kicked in and I was able to get news and report it to you. All-in-all, a successful emergency, if there is ever such a thing.

If anyone is NOT back up, please let me know immediately.

http://gryphynmedia.com/helpdesk/
(Notice, the helpdesk is in a different datacenter, for just this reason.)<

Datacenter Fire

Filed under: — Tracy @ 11:36 am

This notice will NOT affect all Gryphyn Media clients… just those on the nameservers ns3.gryphynmedia.com and ns4.gryphynmedia.com:

There is an electrical fire at the Northern New Jersey datacenter. The firefighters shut down their power (and could not be talked out of it), so that not even the back-up generators are allowed to run. To the best of my limited knowledge, no server itself is in danger, and it is expected that the facility will be back up within the hour. That is ALL that I know right now… I will update you as more information becomes available.

This is not just my servers… this is thousands and thousands of servers affected.

17th March 2003

Server Move Progress

Filed under: — Tracy @ 5:09 am

Clients got notices over the weekend about the server move. The old datacenter had more trouble last week, and we are again accelerating the move, rather than making individual arrangements on a client-by-client basis. It is progressing more slowly than anticipated, but going smoothly. If you have not received your second notice, telling you to make the DNS changes, you should see it within a day or two.

Make sure I have a current email address on file for you… some client email has bounced. I need an off-server address for everyone, especially when we are doing work like this, so that I can be sure to reach you if something is wrong.

Please use the HELPDESK if you are having a problem… helpdesk tickets are easier to track than emails, and you can be sure I got your message.

I appreciate your patience… this move, while annoying, will result in faster service, restoration of our good uptime record, and more features.

7th March 2003

New server available

Filed under: — Tracy @ 12:17 am

As many of you are aware, we’ve been having trouble in our Atlanta datacenters. I have a new server up in a new datacenter in New Jersey. I’ve contacted some of you already, and am working my way through the client list. We will be agreeing on a date to move you, and I will be making every effort to choose days that do not jeopardize your business or site development plans. I will bundle up your files and move them. You will not have to do much beyond changing your DNS settings and checking that everything works properly. I apologize for any disruption… but this will result in better service for everyone. As the servers fill, I will deploy new ones. The support FAQs will be changing to reflect the fact that not everyone has the same IP anymore. You should KEEP the email that will contact your new server info. I will, of course, be available to troubleshoot any email or site problems that occur.

19th January 2003

Datacenter Diversification

Filed under: — Tracy @ 2:45 pm

I sent a note out to everyone last week. Some of you were affected by an outage in the Atlanta datacenter. Unfortunately, that is where the server with the Gryphyn Media site and helpdesk was also located. I have moved forward a plan to move the GM site and helpdesk to a northern NJ datacenter, so you can always reach it. We are all currently located on servers in two related Atlanta datacenters. I apologize again for last week’s bumpy ride. We made some hardware changes to ensure it does not happen again. And I smacked the tech guys around a little.