Sorting out some myths and facts concerning Windows Server 2008 Failover Clustering

You may be wondering why, at this point in time, we are publishing a blog such as this. That is a good question, but the answer is rather straightforward – because even though Windows Server 2008 has been out for a while now, we are starting to see an increase in the number of customers starting to use the Failover Clustering feature. Some of these customers are still trying to apply the concepts from previous versions of Microsoft Clustering. Doing this sometimes causes problems and we end up getting calls. In an effort to head-off some of those calls, here are some myths and facts about Windows Server 2008 Failover Clustering. We hope these will help.

1. If the Cluster solution purchased from the hardware vendor does not appear on the Hardware Compatibility list as documented in KB309395, then it is not considered to be a supported solution.

Myth: This is no longer true. In Windows Server 2008, the compatibility requirements have changed. All that is required now is that the Cluster solution consists of hardware that has received a Windows Server 2008 Logo and that the solution passes the built-in Cluster validation process with no failures. You are still considered as supported if there are warnings. A Warning means a ‘best practice’ has been violated and probably should be addressed to ensure for high availability. To learn more about the validation process, read Failover Cluster Step-by-Step Guide: Validating Hardware for a Failover Cluster. Additionally, you can read about the support policy in The Microsoft Support Policy for Windows Server 2008 Failover Clusters article.

2. I am planning on using the same hardware that I am currently using for my Windows Server 2003 cluster. It is on the HCL, so it should work.

Myth: It is incorrect to assume that hardware purchased to run a Windows Server 2003 cluster will work with a Windows Server 2008 Failover Cluster. Please check with you hardware vendor to verify if this is true or not. You can go to the Microsoft website and see which vendors are participating in the Failover Cluster Configuration Program. If your vendor is a part of this, then you next step is to look it up on the Windows Server Catalog and see what particulars need to be in place for it to work properly. There may be specific drivers, firmware, BIOS, etc updates that need to be in place before going to Windows Server 2008. If you are planning to use the same hardware, it is covered in the next topic.

3. I remember when I upgraded my Windows 2000 Server Cluster to Windows 2003, I evicted one node from the Cluster, joined the Windows 2003 node to the Cluster and then ran the process one more time. In the end, I had a brand new Windows Server 2003 Cluster. I am sure that I can do this again in Windows Server 2008.

Myth: This is no longer true. There have been so many design changes made in Windows Server 2008 Failover Clustering, that the method of upgrading Cluster nodes is no longer supported. Servers that are running the Cluster Service cannot be upgraded to Windows Server 2008. The Cluster Service must be removed before the upgrade process can be executed. What you can do is evict one of the nodes and rebuild it with Windows 2008 and install the Failover Clustering feature. Then, create a single node Cluster and migrate the resources over. Migration information is covered in this blog or the Failover Cluster Step-by-Step Guide: Migrating Cluster Settings from Windows Server 2003 to Windows Server 2008. Please remember, you would need to run the Cluster Validation and must pass once you have all nodes up and running with Windows 2008 Failover Clustering to be supported.

4. It takes a ‘rocket’ science degree to be able to configure and maintain a cluster.

Myth: This is simply not true In Windows Server 2008. One of the primary design goals for Windows Server 2008 Failover Clusters was to make setting up and maintaining them easier and less time consuming. Using wizard-based processes, it is simply a matter of providing the requested information as you step through a process that ensures all configuration settings required for proper functioning of the cluster are put in place. In the end, the cluster just works. This is not to say there isn’t a learning curve or that old habits will be hard to break, but let’s face it…it’s not your grandfather’s Cluster anymore. To start learning about Windows Server 2008 High Availability Technologies, you go look at the Clustering and HA Resources blog for all sorts of topics (whitepapers, guides, videos, etc).

5. Clusters are just too sensitive. I mean, look at the quorum disk. If that fails, I am out of business.

Myth: In Windows Server 2008 Failover Clusters, the concept of ‘quorum’ has taken on a whole new meaning. It no longer means a disk which, if it fails, could take the Cluster down. It is more of a concept involving attaining a sufficient number of votes from the Cluster membership for highly available services to continue to be provided to end users. For more information about the new quorum model, read the Failover Cluster Step-by-Step guide; Configuring the Quorum in a Failover Cluster.

6. There are too many restrictions on Clusters. The one that always comes up is that all of the Cluster nodes must reside on the same subnet. In a multi-site scenario, I have to stretch VLANs across for all networks to make it work and have to worry about network latency, not to mention some cost factors.

Myth: This is no longer true. The changes made to the Cluster Networking model now allow Cluster nodes to be located on separate, routable subnets. This provides more flexibility when designing Clusters. It also lifts the critical restriction (500 millisecond round-trip times) for multi-site clusters. As long as there are at least two properly configured and functioning network paths on each node, the new Cluster Network Driver can figure out how to get to each node in the cluster using the best possible route.

7. Cluster networking is too hard to understand. There are just too many concepts of private, public, heartbeat, disabling NetBIOS, no NIC Teaming, Windows Network Priority versus Cluster Communication priority, and the list goes on. It just seems there are too many things to deal with.

Myth: This is no longer true. The old way of doing business involved providing multiple network connections between the nodes of the Cluster and dedicating at least one of them specifically to internal Cluster communications. The Internal Cluster Communications adapter that is on this isolated network could not be supported by ‘teamed‘ network cards. In Windows Server 2008 Failover Clusters, all we ask is that there be at least two fully functioning, properly configured networks that the Cluster can use for communications with other nodes. If this configuration is not in place, a Warning will be registered when the validation process is run. Again, a warning means that ‘best practices’ are not in place.

8. When I create a Cluster in Windows Server 2008, I am not asked to provide a domain user account. Can I change it later on to the account I want?

Fact: That is correct and no. The whole security model in Windows Server 2008 Failover Clusters has changed beginning with the removal of the requirement for a domain user account to run the Cluster Service. Now, the Cluster Service runs under the Local System account. Additionally, all Cluster Network Name (now called Client Access Points or CAP) resources register a Computer Object in Active Directory when they come online. The Computer Objects, by default, are placed in the Computers OU. They can be moved or pre-created elsewhere if desired. For more information, review the Failover Cluster Step-by-Step Guide: Configuring Accounts in Active Directory.

9. Connections to Cluster file shares cannot be made using IP Addresses, so now all my mapped network drives that use IP Addresses to connect to user shares will no longer work.

Fact: This is true. To connect to Cluster shared folders in Windows Server 2008 Failover Clusters requires the access point be either the NetBIOS or Fully Qualified Domain Name (FQDN) corresponding to the Network Name resource. This is because there is a new ‘scoping’ feature which is part of the changes that went into the product to support highly available File Servers. For more information on ‘scoping,’ you can refer to the File Share ‘Scoping’ in Windows Server 2008 Failover Clusters blog.

10. I can no longer share subdirectories like I used to in Windows Server 2003 Clusters. There is no File Share resource type any longer.

Fact: This is true. Quite a few of our customers have been impacted by the loss of this capability but are slowly but surely coming to grips with it. It helps to understand a little bit of the history behind this functionality. To assist with that, the Windows Clustering & High Availability Product Group wrote a short blog about it. It is important to understand how the new functionality works especially if you are going to migrate from Windows Server 2003 File Server clusters to Windows Server 2008. This blog also provides access to additional TechNet content on migration. Be sure to also review the new File Server Migration Toolkit as it can be used as well.

11.  I’ve been working with Clusters for a long time and I sometimes seem to have issues with the shared storage and disk signature changes.

Myth: Working with shared storage in Windows Server 2008 Failover Clusters is a much better experience. The Cluster storage model, like the networking mode, has been redesigned. The Cluster Disk Driver (CLUSDISK.SYS) is no longer in a ‘direct’ line to the storage stack, but sits off to the side of the Windows Disk stack.

Access to storage is now the primary job of Partition Manager. If the Cluster Service needs to interact with storage, the Cluster Disk Driver communicates via the Partition Manager driver (PARTMGR.SYS). Additionally, the way Cluster reserves a disk is different in Windows Server 2008. Instead of using the SCSI-2 protocol features of Reserve\Release\Reset, Cluster Service now uses SCSI-3 Persistent Reservations to reserve a disk. And, for even more flexibility and stability, the Cluster Service uses multiple attributes to identify a piece of storage (Disk Signature and Disk Unique ID). If at least one of these two attributes match, the Cluster should be able to bring a disk online and update this change in its Cluster registry. This means that if only the disk signature changes, Cluster will bring the drive online and you may never even know it changed. No more DUMPCFG or Cluster Recovery!! If you are totally replacing a disk, you can add it to the nodes; bring up the properties of the disk resource being replaced and hit the REPAIR button. It will bring up another window for you to select which disk you are replacing it with and you can then bring it online. Done! No more DUMPCFG or Cluster Recovery!!

12. I cannot find the Cluster Log in the Cluster subdirectory (%windir%\cluster). Does this mean we no longer use it?

Myth: The ‘Eventing‘ Model is new in Windows Server 2008. Actually, it is carried forward from Windows Vista into the server product. The Cluster Log has ‘evolved’ into a trace log format (.etl) and this logging is configured to start at boot time. To access the data in a readable text Cluster Log requires using the cluster.exe CLI to dump out the information. The Windows Clustering & High Availability Product Group has written a blog that discusses this as well as talks about using TRACERPT.EXE to convert the .ETL into a readable HTML format.

13. In Windows 2003 Cluster, if there was a resource that had problems, I had to sit there and let it fail multiple times on one node, watch it go to another node and fail there multiple times and then just bounce around. This takes time that I do not have to get to where I can correct the problem. Being that I am new to Windows 2008 Clustering, I am afraid I am going to spend unnecessary cycles waiting for this same thing to occur if I do something wrong.

Myth: In Windows Server 2008 Failover Clusters, the default recovery behavior has changed. In Windows Server 2008 Failover Clusters, the resource tries only one restart in a 15-minute period before the resource is marked as "Failed." Then, the Service group or the Application group to which the resource belongs is failed over to another node in the cluster. This new behavior improves the high availability model by giving resources only one restart attempt on each node in the cluster before a failover occurs. As soon as an unsuccessful attempt to restart a failed resource on each node in the cluster is made, the resource is marked as "Failed."

14. Trying to find information about Windows 2003 Clusters was a long tedious process sometimes as there are so many places to go on the Microsoft pages. Am I going to have the same ‘travels’ with Windows 2008?

Myth: The Windows Clustering & High Availability Product Group has created a Failover Clustering Portal which serves as a more centralized repository for high availability information. They have also created this blog that breaks things out a little more.

Hopefully, this blog will help clear up some of the questions or concerns floating around out there about Windows Server 2008 Failover Clusters. As always, the CORE Team strives to provide information that we hope will be useful. Some of this information does originate from the types of issues we deal with on a daily basis in the various technology areas we support. Thanks for your attention.