Field NotesTLC for Your Server
In the July issue I wrote about the care and feeding of your server. I beseeched you to treat your servers as if they were living beings. Do so, and they’ll reward you with steady service. Don’t, and you may have to start thinking about disaster recovery. It’s far better to build disaster prevention into your daily maintenance schedule.
It doesn’t take the power of a lightning bolt to destroy your server. A simple brownout can do lots of damage. Yes, yes, I know you have a mondo UPS with line conditioning, but when was the last time you looked at the UPS itself? They put those little lights on the front for a reason. Over time, batteries fail. The little lights turn red in warning, often months or even years before the batteries quit. Then their services are needed one day and—BAM. Now you have a dead server.
You should also have a look at the load distributed on each UPS. Good ones have a meter to show you whether they are overtaxed. I usually keep each one at no more than 50 percent capacity, just to be on the safe side. Of course, you hooked up the USB cable and loaded the necessary software to shut the server down in the event of a power outage, right?
Recently I had a UPS battery fail. This isn’t usually an emergency but I made sure to order a replacement right away. When my new battery arrived, I went to replace the old one and it was a mess. The battery had swelled from heat on the inside of the steel case. It didn’t rupture but it was very close. It was so bad I had to use a hammer and a screwdriver in order to remove the old battery. I hate to think what might have happened if I hadn’t noticed the problem.
Most of the time your server doesn’t ask for much—it just sits there humming along. But it does try to give you advance warning when things are not as they should be. It speaks its own special language known as the Event Log. I know many of you ignore the Event Log, but this is exactly where integrating prevention into your regular schedule will do you the most good. Most problems will be identified with a warning in the log before they reach a critical state. The Event Log can identify problems with power and heat, register an alert that you’re running out of drive space, or note a failed hard drive or other peripheral. Failed backups will show up in the Event Log, as will all attempts at unauthorized access. Look at the Event Log daily and look up anything that isn’t clear.
Of course, we’ve all seen servers get sick with a virus. Just as with people, servers can be cured most of the time, but the longer they’re left untreated the more damage they’ll suffer. I know you have virus protection and scans scheduled to run on a regular basis. Do you actually look at the scan results, clean things up if necessary, and then make sure the server is clean by running an additional scan? If you don’t find and eliminate a virus early enough, it can spread to other servers and PCs.
Certainly human error is one of the most dangerous of all threats to your server. How many times have you done something you know is wrong but you’re just too impatient to take proper precautions? Or maybe you haven’t done something you should have. I was once working with a client who was having trouble with his Exchange Server information store. I was remote, helping him over the phone. I asked him to make sure his backup was current before running eseutil.exe and he told me he had it covered. I wanted to run the utility on a copy of the database rather than the original file, but making a copy would have required extra time. Of course Mr. Murphy (of Murphy’s law) reared his ugly head. The database was not repairable and we found that, in fact, there was no backup done the night before. Two days of e-mail were lost, and lots of users were screaming.
You can also kill the server with kindness. Even if you’re diligent about preventive maintenance, you still need to be on guard. One of my customers has a single server sitting in a closet that is shared. Every few months the cleaning crew vacuums in and around the closet, and almost every time they do, I get a call the next morning. Usually I find that the cables were removed and not put back correctly or the server was just simply turned off by accident and not restarted. Sure, you want to keep dust away from your server, but exercising a little caution when cleaning near the machines is pretty important. It may even be necessary to restrict physical access to the server area.
I’m sure you see the common thread here. Too often we become unwitting collaborators in our server’s destruction, just for lack of a few logical precautions. If you take care of problems while they’re small, they’ll never grow into disasters. To do this, though, you need to be aware that the problems exist. If your daily maintenance schedule makes checkups routine, your server will live a long, happy life.
Jay Shaw is an independent network consultant. His company, Network Consulting Services, is located in Long Island, New York. He can be reached at firstname.lastname@example.org.
© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited.