SQL Q&A: Backup and Setup
SQL Server is a powerful platform, but does require finesse when considering your transaction log settings and other configuration questions.
Paul S. Randal
XXXL Transaction Logs
Q: Our product uses SQL Server to store data. Every so often, we release a new version of our product that includes an upgrade script to run against the database. As we were testing our latest upgrade script on a representative test database, the transaction log file grew to more than 40GB. We’d like to prevent the log file from growing so large. What are our options? We have to stick with the full recovery model for disaster recovery purposes.
A: To start with, it’s great that you’re testing against representative customer data. So many times, I see layered application vendors testing these kinds of scripts on small data sets and then releasing them to their customers, who then run into all kinds of production problems. I’ll answer your question as if you’re the user. Then you can translate into the context of your customers.
You say you need to stay with the full recovery model. This implies you’re already taking transaction log backups and you don’t have general problems with the transaction log growing out of control. This is good, as taking transaction log backups is the only operation that lets the transaction log clear once transactions have committed. (For background on this, see technet.microsoft.com/magazine/2009.02.logging , which explains how the transaction log works and how different recovery models affect its behavior.)
Having said that, the frequency with which you perform transaction log backups is one thing that will dictate how quickly the transaction log can clear and not grow. For example, if your regular backup job performs a transaction log backup every 30 minutes, the transaction log file must be sufficient to hold the largest amount of transaction log data that can be generated within a 30-minute period. Otherwise, it will grow.
If your upgrade script runs in 60 minutes, that could work out to 20GB of transaction log generated every 30 minutes, so the transaction log file will have to be 20GB. That’s probably still too large, so you’ll have to perform transaction log backups more frequently while the upgrade script is running. This will let the transaction log clear more frequently and prevent it from growing so large. We had a similar situation at a client location, and ended up performing a transaction log backup once per minute for several hours while a similar script ran on a large database.
One thing to keep in mind is that these “extra” transaction log backups form part of the log backup chain and are necessary for disaster recovery. Make sure they all have meaningful names and are not deleted.
There’s another factor to consider: What is the largest single transaction that occurs as part of the upgrade process you’ve engineered? The transaction log can only clear if the log records are from committed transactions (which may be oversimplifying a bit—see the aforementioned article for more details). This means a long-running transaction won’t let the log clear, even though the transaction log backup does back up the generated transaction log.
If your upgrade script contains a transaction requiring 15GB of log space, the transaction log file will have to be at least 15GB to hold the entire transaction until it commits. If this is the case, no matter how frequently you perform a transaction log backup, the transaction log won’t clear. The only recourse in this case is to split the large transaction into smaller ones, if possible.
Keep this in mind: the size of the transaction log required to run your upgrade script will be determined by the frequency of the transaction log backups and the size of the single largest transaction you create.
Q: We’re provisioning some new direct-attached storage for one of our database servers, and we want to make sure we understand all of our options and get the configuration correct. Can you explain the various configuration settings we should be aware of as far as SQL Server is concerned?
A: There are myriad settings and configuration options when provisioning storage, so I prefer to involve a dedicated storage administrator. There are definitely some options with which a SQL Server administrator should be concerned, and be sure to set it appropriately.
The first of these is the underlying RAID level. The various RAID levels have different trade-offs as far as performance and redundancy are concerned. For example, the cheapest RAID configuration that still offers some redundancy is RAID-5, but this configuration can only cope with a single drive failure (unless using RAID-6, or configured hot-spare drives), and can sometimes impair performance for write-heavy workloads, depending on how many drives are in the array.
RAID-10 provides the best redundancy, but it’s more expensive. The overall array capacity is at most half the total capacity of the constituent drives. A good discussion of the various RAID levels is presented in Appendix A of the TechNet white paper “Physical Database Storage Design.”
The other main factors to consider are the RAID stripe size, the NTFS allocation unit size (the cluster size) and the disk partition alignment. All these factors can drastically reduce performance if set incorrectly. The most important is the disk partition alignment for disk volumes created using Windows Server 2003. This uses a 31.5KB alignment by default, which doesn’t match common RAID stripe sizes of 64KB (or a multiple thereof). Therefore, each I/O essentially has to read or write two RAID stripes to satisfy the IO. This obviously causes huge degradation in performance.
Windows Server 2008 uses a 1MB alignment by default. Any volumes created on Windows Server 2003 and upgraded for hosting by Windows Server 2008 don’t have their alignment changed, so they may still be affected. Fixing this problem means reformatting the volumes, but the performance gains often make it a worthwhile step.
A thorough discussion of these is really beyond the scope of this column, but there are more details (and links for further reading) in my blog post, “Are your disk partition offsets, RAID stripe sizes and NTFS allocation units set correctly?”
When provisioning any new storage, it’s a good idea to stress test and performance test before starting a production workload. Stress testing lets you flush out any configuration problems that could lead to downtime or data loss. Performance testing helps you verify that the new storage provides the I/O capacity your workload requires. Microsoft has free tools to help with this, which you can learn more about in the white paper “Pre-Deployment I/O Best Practices.”
Q: I’m a little confused about the nature of the witness server when setting up database mirroring. How powerful does the witness server have to be? Does it depend on the number of databases for which it does failover? Does it matter in which datacenter you place the witness server? I want to make sure to get the highest availability for the mirrored databases.
A: The role of the witness server is one of the most misunderstood aspects of any database mirroring system. The witness server in a synchronous database-mirroring configuration exists solely to help facilitate an automatic failover in the event that the principal server becomes unavailable.
The principal server continuously sends transaction log records to the mirror server, never the witness server. The principal, mirror and witness servers ping each other every second as part of the automatic failure-detection mechanism. If the mirror server determines that it can’t communicate with the principal server for whatever reason, it can’t initiate an automatic failover unless the witness server agrees it also can’t communicate with the principal server. If the two servers agree, that forms a quorum and the mirror server initiates an automatic failover. If a witness server isn’t present, there can’t be a quorum and an automatic failover isn’t possible.
Thus, the witness server exists solely to help form a quorum. It doesn’t initiate failover or play any part in hosting the mirrored database. Usually the quorum exists between the principal and mirror servers.
As the witness server does no processing as such, it doesn’t need to be a powerful server. It can host any edition of SQL Server, including the free SQL Server Express Edition. There’s also no limit on the number of database mirroring sessions for which a particular instance of SQL Server can act as witness.
The witness server is best placed in a separate datacenter from the principal or mirror servers. However, most companies don’t have three datacenters, so the question is whether the witness server should be placed with the mirror server or with the principal server.
When only two datacenters are available, the witness server should always be placed with the principal server. The reason has to do with forming a quorum. If the witness and mirror servers are co-located, and the network link drops to the principal server, a quorum will form between the witness and mirror and the mirror server will initiate a failover.
The principal server, which may not have any problem at all, will take the principal database offline when it loses the quorum. It assumes the mirror will perform a failover in this case. To prevent this, co-locating the principal and witness servers lets the principal maintain the quorum with the witness in the event of a network failure. The principal database remains available.
The witness server is entirely optional, yet there’s no possibility of an automatic failover—and hence the highest availability of the database being mirrored—without one. Database mirroring operates the same in every other way. If a witness server is configured but becomes unavailable for some reason, there’s no loss in mirroring functionality except the ability to perform an automatic failover.
You can also have two witnesses for a database mirroring session. The only way to add even more redundancy to the witness server role is to have the witness SQL Server instance hosted on a failover cluster. You can get more information on database mirroring configurations in the TechNet white paper “Database Mirroring in SQL Server 2005.”
Paul S. Randal* is the managing director of SQLskills.com, a Microsoft regional director and a SQL Server MVP. He worked on the SQL Server Storage Engine team at Microsoft from 1999 to 2007. He wrote DBCC CHECKDB/repair for SQL Server 2005 and was responsible for the Core Storage Engine during SQL Server 2008 development. Randal is an expert on disaster recovery, high availability and database maintenance, and is a regular presenter at conferences worldwide. He blogs at SQLskills.com/blogs/paul, and you can find him on Twitter at Twitter.com/PaulRandal.*