Hardware Compensation

I spent some time with a customer recently to help them design a failover for their failover.

Let me explain....they have a Windows Server 2003 & SQL Server 2-node failover cluster. Works great....except they have experienced several hardware related outages over the past year. This is unacceptable! So they wanted some help in designing what I'll call a "continuation of business" server to act as a backup for when/if the whole cluster goes *burp*.

Now my initial reaction is to be the somewhat cynical software consultant and simply respond "sounds like a hardware problem?!" (How many software engineers does it take to change a light bulb?) But I resisted the urge...and we set out to try to find a solution.

The bottom line is that there is now "automated" failover solution between a cluster and anything else. Sure, you could use some type of replication, log shipping, etc. to create the "COB" server - but the challenge of getting back up and running on the primary cluster is daunting. We have decided to go with replication (transaction or merge....future research to identify the best fit) and a manual transition, e.g. config file change.

So I was thinking on the way home....why is the hardware so unreliable? Is it an investment thing? Has hardware been so commoditized that you simple buy it in quantity and expect failures to happen? And how bad an experience has this customer had with their hardware (I think this points at the HW vendor a lot) that they are looking for what amounts to a software solution to an hardware problem? I've had some experience in the "SW to solve a HW problem" .... prefer to not return to that if I can help it.