SQL Server 2008 and R2 Cluster Best Practices
I have been working with several of my customers recently helping them install and/or review a new SQL Server 2008/R2 cluster environment on Windows Server 2008/R2 for best practices and gotchas. The Cluster Validation wizard in Windows Server 2008/R2 checks for many clustering best practices and reports them as warnings in the cluster validation report. Therefore, the first thing I do to look for best practices is to review the cluster validation report. In addition to that, I usually look for the following additional best practices on a SQL Server failover cluster -
Before starting SQL Server installation
1. Configure MSDTC as per best practices
It is required to configure the MSDTC as a clustered resource on a Windows 2003 Failover cluster to be able to install SQL Server 2000/2005. If you do not cluster MS DTC prior to installing SQL Server 2000/2005 on Windows Server 2003, you may notice errors when using SQL Server features that depend on MSDTC (such as some Replication functionality). SQL Server 2008 and R2 do not have any dependency on the MSDTC service anymore and so you don’t need to configure MSDTC as a clustered resource before installing SQL Server 2008 and R2. However, if your applications require MSDTC (which you can confirm with your application vendor or your in-house development team) or if you are using SQL Server features that always require MSDTC such as Linked Servers, then MSDTC service must be configured as a clustered resource on Windows Server 2003. There are many enhancements to the MSDTC service in Windows Server 2008/R2 that you can take advantage of. To list a few of them –
- In Windows Server 2003, you could only have one MSDTC clustered resource, which could cause Isolation and Performance issues. In Windows Server 2008/R2, you can have multiple MSDTC clustered resources on a multi-instance (Active-Active) cluster.
- In Windows Server 2003, we didn’t have the ability to run MSDTC as a standalone service on a Windows Failover cluster. In Windows Server 2008/R2 you can do that.
If your application uses distributed transactions or Linked servers, you might encounter various error messages if the MSDTC is not configured correctly. Sometimes application dependency on the MSDTC is not very clear, and so I usually suggest my customers to configure the MSDTC service on a cluster as per best practices. The recommended configuration parameters for MSDTC service are discussed in these articles –
2. Pre-stage MSDTC and SQL Server account objects in Active Directory prior to installing a MSDTC or SQL server cluster
You might run into issues while configuring a cluster resource or bringing it online, if the computer object of the cluster service does not have permissions to create computer objects in the Active Directory. If you are experiencing this problem, you will also notice the event ID 1194 in the event logs. Here is the explanation -
When you create a new clustered service or application, a computer object (computer account) for that clustered service or application must be created in the Active Directory domain. This computer object is created by the computer object of the cluster service itself. If the computer object of the cluster service does not have the appropriate permissions, it cannot create or update the computer object for other clustered service or application such as SQL Server, MSDTC etc. This can result in an error with event ID 1194, while configuring the cluster resource or bringing it online.
Therefore it’s important to pre-stage a computer object for the new clustered service or application in Active Directory before configuring that application (such as SQL Server and MSTDC).
3. Benchmark Disk Performance
It’s no secret that the I/O system is very important to ensure good SQL Server performance. When configuring a new server for SQL Server or when adding or modifying new disks to an existing system, it is good practice to determine the capacity of the I/O subsystem prior to deploying SQL Server. Tools such as SQLIOSIM and SQLIO can be used for storage benchmark testing and capacity planning. Here are some additional references on these tools –
- How to use the SQLIOSim utility to simulate SQL Server activity on a disk subsystem
- SQL Server Storage Engine team blog on SQLIOSIM
- SQLIOSim - Outstanding I/O Warnings
- How It Works: SQLIOSim - Running Average, Target Duration, Discarded Buffers
- SQLIOSim is "NOT" an I/O Performance Tuning Tool
- Tools used to test I/O subsystems
4. Use Slip Streaming or install the latest support files prior to starting the SQL Server installation
Slipstreaming is a process that incorporates all necessary Service Packs and hotfixes into a folder that contains the complete, original installation files. It is recommended to use the slipstreaming to install SQL Server to ensure that the new SQL Server instance is fully updated and ready to go, saving significant time. Slip streaming also installs the latest setup support files on the box, which helps avoid any bugs/known issues with the SQL Server installation engine. For more information on how to slip stream SQL Server binaries, please refer to these article -
- How to update or slipstream an installation of SQL Server 2008
- Creating a merged (slipstreamed) drop containing SQL Server 2008 RTM + Service Pack 1
If using Slip streaming is not an immediate option (such as due to time constraints to create the merged folder), it is recommended to install the latest setup support files in order to fix the SQL Server 2008 Setup even before you run it. For example, use the SQL Server 2008 SP2 to install the SP2 setup support files and then start the SQL Server 2008 installation wizard. This will help you avoid bugs and known issues with the SQL Server installation engine. More information on this can be found in this article - How to fix your SQL Server 2008 Setup before you run setup
5. Ensure that the account used to install SQL Server cluster has the appropriate permissions
I was working with a customer recently, helping them in installing a SQL Server 2008 R2 cluster on Windows Server 2008 R2 and we were getting a lot of permission errors during the installation. The setup account was a member of Local Administrators group on all nodes in the cluster. After troubleshooting the errors, we discovered that in addition to local administrative permissions, the setup account must have these additional permissions on each node in the cluster, to successfully install SQL Server -
- Backup files and directories (SeBackupPrivilege)
- Debug Programs (SeDebugPrivilege)
- Manage auditing and security log (SeSecurityPrivilege)
This issue is described in detail in this KB article - SQL Server 2008 installation will fail if the setup account does not have certain user rights. We granted the above permissions to the logged on user account manually through the Local Security Policies administrative tool and were able to complete the installation successfully. I suggest that you verify the setup account permissions as described in the KB article prior to starting the SQL Server cluster installation, to avoid any setup failures.
6. Ensure that the SQL Server account should not be a member of Local Administrators group
It is not recommended to assign the local Administrative permissions to the SQL Server and/or SQL Server Agent service account. SQL Server and Agent services needs specific permissions on the box, which are granted to the service accounts automatically during installation or when the service account is changed through SQL Server Configuration Manager tool. For more information on specific permissions required the by SQL Server services, please refer to the following article - Setting Up Windows Service Accounts
During SQL Server Installation
1. Use Service Identifiers (SIDs) instead of Domain groups
SQL Server enables per-service SID for each of its services on Windows Server 2008 and R2 systems in SQL Server 2008 and SQL Server 2008 R2 to provide service isolation and defense in depth. The per-service SID is derived from the service name and is unique to that service. Service SID name for a default instance of SQL Server is NT Service\MSSQLSERVER and SQL Server Agent Service is NT Service\SQLSERVERAGENT . Service SID name for a named instance of SQL Server is NT Service\MSSQL$<InstanceName>. Service isolation enables access to specific objects without the need to run a high-privilege account or weaken the security protection of the object. By using an access control entry that contains a service SID, a SQL Server service can restrict access to its resources. For more information, please refer to this article - Setting Up Windows Service Accounts
After SQL Server Installation
1. Implement antivirus exclusion best practices:
Besides contending for precious system resources such as CPU and memory, antivirus programs can cause issues during SQL Server startup and in some cases may also cause database corruption. This could happen during system startup, if an antivirus program gets a lock on a file (such as MDF, LDF etc.) before SQL Server does. It’s true that antivirus programs have evolved quite a bit in the last few years and this is less of an issue these days. Nonetheless it’s a best practice to exclude SQL Server related directories and files from Antivirus scanning, wherever possible. In addition to excluding the MDF, LDF and NDF database files, antivirus exclusion list should be configured to exclude these additional directories on a Windows Failover cluster –
· Quorum drive
· <system root>:\Windows\Cluster
Please refer to this KB article for more information on how to configure an antivirus program that is running on a SQL Server box - Guidelines for choosing antivirus software to run on the computers that are running SQL Server
2. Run SQL Server 2008 BPA tool to check for other best practices
SQL Server Best Practices Analyzer (BPA) is a free downloadable tool that can be used to scan a SQL Server instance for common best practices and other stability issues that could result in server-down scenarios. This tool encapsulates a variety of checks to prevent most common SQL server issues, which are received by Microsoft support. The best part is that new test cases are added to the tool on a frequent basis to help our customers prevent most common SQL Server issues, which other customers are experiencing. The BPA tools is available for all versions of SQL Server. Here are the download links for each version -
3. Configure SQL Server memory settings appropriately on a multi instance cluster
By default, every SQL Server installation has “Max Server Memory” parameter set to use all available physical memory. This could raise problems when multiple instances are present on the cluster. It is important to cap SQL server memory by configuring the “max server memory” configuration option for each instance on a multi instance cluster. This would ensure that each SQL Server instance will get sufficient amount of memory if both instances failover to the same node in the cluster during the planned or unplanned failover. Note that “max server memory” setting only specifies the Buffer Pool memory. SQL Server as a process acquires more memory than specified by “max server memory” option. Both internal and external components can allocate memory outside of the Buffer Pool and this additional memory consumed by SQL Server is known as Multi Page Allocator (MPA) or Mem to Leave (MTL). The Performance Monitor counter SQL Server:Memory Manager\Total Server Memory (MB) can be used to find the Buffer Pool memory consumption of a SQL Server process. Another Performance Monitor counter, Process(sqlservr)\Private Bytes can be used to know the total memory consumption by a SQL Server process. The difference of these two counters can be used to calculate the MPA/MTL memory consumption by a SQL Server instance. MPA/MTL memory is used by components as such threads (usually the biggest consumer on a busy production server), linked servers, COM components, extended stored procedures etc. Following formula can be used to calculate the “max server memory” for each instance in a multi instance cluster –
SUM (max server memory for “resident” instance/s) + MPA for all instances <= RAM –(OS memory + Applications)
where “resident” instance/s refers to the total number of instances that could be running on a single node of the cluster keeping node failure into consideration.
For example on a 2 node Active-Active cluster, “resident” instances would be 2 because if a single node fails, then both instances would failover to the same node. Let’s assume that each node has 24 GB of RAM, OS needs 2 GB of dedicated memory, other applications needs 2 GB , MPA for each SQL Server is 1 GB and both instances need equal amount of memory. So how much “max server memory‟ should you give to each instance in case of a node failure? Let’s say max server memory for each instance is X GB. Using the formula above
SUM (max server memory for “resident” instance/s) + MPA for all instances <= Total Available Memory –(OS memory + Applications)
(X+ X) + 1GB * 2 <= 24 – (2GB + 2GB)
X<= 9 GB
i.e. “max server memory” for each SQL Server instance should be configured as 9GB in this scenario. For more information on SQL Server memory configuration, please refer to this article – Server Memory Options
4. Consider adding a third passive node to the 2-node cluster
As discussed above, when using the Active-Active configuration on a 2 node cluster, each instance is configured to use far lesser amount of memory than what is available on each node. For example on a 2 node cluster with 24GB of physical memory on each node, assuming that the OS and other application need 4 GB and the MPA for each SQL instance is 1 GB, only 9 GB of memory can be allocated to each instance. However, if we add a third passive node to this cluster, the number of “resident” instances becomes 1 instead of 2. In that case, we can allocate as much as 19 GB of memory to each instance by using the above formulae. Adding a third node would also provide an added advantage of having additional resources such as CPU, network and I/O on the third passive node to address planned/unplanned failover without causing performance degradation.
If you are using Mount Points
1. Mount Point Best Practices
SQL Server 2005 and later versions failover clustered instances fully support mounted drives if the mounted drive is hosted by a cluster drive with a drive letter assigned. When mount points are being used together with the physical disks, each mount point must appear as a cluster resource. Additionally, each mount point must also be added as a SQL Server dependency. If only the root physical disks dependency is added and the mount points are not added, database corruption will occur on failover. Database corruption may also occur when SQL Server is restarted without failing over. For more information please refer to these articles -
- SQL Server support for mounted volumes
- How to configure volume mount points on a Microsoft Cluster Server
To avoid any issues during SQL Server installation, it is recommended to verify that Mount Points are configured and working properly. To verify that mount points are working correctly, try copying/creating files on mounted volumes. Also make sure that the root volumes and the mounted volumes appear as “Available Storage” in the Failover Cluster Manager and rename the mounted volumes to display their logical function.
2. While configuring the data directories in the SQL Server setup, do not use the root mount drive (Master drive) to store any data/log/backup files
When using mount points, it is not recommended to use the root (host) volume to store files and folders. The root volume is the volume that is hosting the mount points. Try to use the root (host) volume exclusively for mount points. This greatly reduces the time that it takes to restore access to the mounted volumes if you have to run a chkdsk. This also reduces the time that it takes to restore from backup on the host volume. If you use the root (host) volume exclusively for mount points, the size of the host volume only has to be several MB. This reduces the probability that the volume is used for anything other than the mount points