Failover Cluster Snafu – Forcibly removing Failover Cluster Feature after Cluster Failure

My secondary job is to maintain our engineering lab (part of the mantra of “Do More with Less”) as we don’t have anyone dedicated to this role.  This lab is running completely virtualized minus our SQL infrastructure which is running on a 2-node Failover cluster since so much of our infrastructure relies on SQL.  In our case, we have a 7-node Failover Cluster that utilizes R2’s Clustered Shared Volumes (CSV) and I recently took a vacation.  Whoa, I bet you didn’t see that coming.  What does a 7-node cluster and your vacation have anything to do with each other.

I’m glad you ask … it is completely related to the fact that the 7-node cluster is conveniently triggered to “fail” during my vacation causing me to stop my vacation and take a look.  Recently, I had a node that simply went south during my vacation and I was super-high on the frustrated level as this lab isn’t my primary focus – though it seems to occupy me way too much lately!

What I thought I would share today is completely unsupported I’m certain but luckily you can take my gossip & rants on this blog as “Well, he doesn’t usually do things in the supported fashion anyhow…”

Dang it… I can’t remove the Failover Cluster Feature because it is still a part of a Cluster

What a “cluster” you might have on your hand.  No Pun Intended.  This is exactly the scenario I had.  I had a host go down that, unfornately, didn’t have access to the cluster any longer since it was evicted.  However, the node itself was seriously convinced that it was in fact still a vital part of the family.  I got high on the level of frustration and decided to start the digging process…

NOTE: DIGGING IN THE REGISTRY FOR LITTLE JEWELS ISN’T RECOMMENDED NOR THE RIGHT IT APPROACH. IT’S AN APPROACH FOR THOSE WHO ARE WILLING TO GAMBLE EVERYTHING AND CAN SAFELY CYA THEMSELVES IF THE GAMBLE DOESN’T PAY OFF.

<notFortheFaintofHeart>  

How to Force Failover Clustering Feature to be available to Remove

Now you know the warning.  Let me share how I just came across this way to force R2’s Server Manager feature wizard to again forget about Failover Cluster and allow me to move forward.  To do this, go to your broken node and open the Registry.

Backing up your registry right now is a great idea… do it and return.

  1. Open Regedit
  2. Locate HKLM\CurrentControlSet\Services\ClusDisk & ClusSvc
  3. Delete these keys

ClusRegKeys

You have now royally ticked off your R2 server though it is only for a brief moment.  Move to the next step…

Uninstalling Failover Cluster when cluster is unavailable

The next step is to open Server Manager and to remove the Feature for Failover Cluster.  When you do this, Server Manager will remind you that you shouldn’t move forward unless you know that all the services are moved off this cluster.  It is…so choose Yes and move on.

ClusWarningMessage

After the removal, it will likely ask for you to reboot which is a pleasant idea.  After rebooting, you can now safely add the feature back and now re-connect to the cluster and start the rebuild process.

</notFortheFaintofHeart>  

Simple.  Easy.  Not recommended…but if you are like me then time sometimes is worth the risk.  If you screw up, you can always rebuild your server. <grin>

Thanks,

-Chris