SQL Q&A: Fine Tuning for Optimal Performance
Duplicate indexes, cancelled rollback operations and I/O spikes can cause performance issues, but you can work around them.
Paul S. Randal
Q. SQL Server seems to let me create indexes that are exactly the same on the same table. How does this help my workload performance? Will different queries use different copies of the same index?
A. It’s unfortunate that SQL Server allows duplicate indexes, as they provide no benefit whatsoever. In fact, duplicate indexes can be detrimental in many ways.
A duplicate index occurs when the index keys are exactly the same as another index, specified in the same order and with the same ASC or DESC specification. The included columns (if any) are also the same (although the included columns can be specified in any order).
SQL Server will only use one of the duplicate indexes to help with queries, but it must maintain all indexes on a table during insert, update and delete operations. This means every time there’s an insert or delete on the table, it must be reflected in all indexes. The same is true for updates, if the columns being updated are part of the index.
This index maintenance uses extra resources and generates extra transaction log records—all for indexes that are essentially useless. These duplicate indexes take extra space on disk and extra space in backups—and the pages necessary for index maintenance take extra space in memory too.
Duplicate indexes are likely to become fragmented. They also require extra resources during regular index fragmentation removal. The extra transaction log records from index maintenance and fragmentation removal can also lead to lower performance of high-availability (HA) features like database mirroring and transactional replication.
SQL Server gives you no warning that you’ve just created a duplicate index, so the onus is on you to avoid doing so. Checking whether you already have duplicate indexes is no small matter. It involves scripting out all the index definitions and manually comparing them, or extensive programmatic parsing of the system catalogs. Last year, Kimberly Tripp posted a complete solution to this problem.
Beware the Rollback
Q. I recently had to cancel a long-running update. After the operation rolled back, the next daily transaction log backup was huge. I expected it to be very small, as nothing had changed in the database. Can you explain this anomoly?
A. This is a fairly common misconception. If you roll back a large operation, the next differential backup should be small, right? Wrong.
Any time SQL Server makes a change to the database, two things happen. First, it generates transaction log records that describe the change. Second, for any data file pages modified by the change, the corresponding bit is set in a differential bitmap. This signifies those pages should be backed up by the next differential backup.
When you roll back an operation, SQL Server has to undo the changes the operation made. This means it examines all transaction log records generated by the forward part of the operation. It has to undo those changes in reverse order. Each transaction log record describes a single change to the database as part of the operation. To roll back that change, you have to make another change to the database that negates the effect of the original change. For instance, you would roll back a record insertion by deleting the record. The net effect is that the record doesn’t exist.
Here’s the most confusing part: each change performed during rollback is really just another database change (albeit a special one). For every change to the database, there must be a transaction log record. So even changes made during a rollback must be properly logged. This means a large operation rollback will generate not only transaction log records for the forward part of the operation, but also for the rollback. Transaction log backups will back up all of these transaction log records, accounting for the large transaction log backup.
When the forward part of the operation causes the differential bitmap to have bits set because portions of the database have changed, you can’t clear the bits in the differential bitmap again because the database has changed. It doesn’t matter whether the change was eventually rolled back. The data file pages have still been changed (twice, in fact), and so must be backed up by the differential backup.
The crux of the matter is that even when an operation is rolled back, the database is still changed. All backups need to reflect those changes.
In Search of Spikes
Q. I’m troubleshooting an issue where we see periodic I/O spikes from one of our SQL Servers. I’ve narrowed it down to checkpoints using PerfMon, but I can’t tell which database is the major culprit. How can I drill in further?
A. Checkpoints exist for two reasons. First, they update data file pages with what has been written to the transaction log. SQL Server uses a mechanism called write-ahead logging, where database changes are described in the transaction log before being reflected in the data files. This guarantees the durability of changes in the event of a crash. Second, they reduce the amount of constant I/O load by only writing out changed data file pages periodically, rather than after every change to each data file page.
Checkpoints occur for each database separately. They’re triggered based on a number of factors, including the recovery interval—this is the SQL Server estimate that enough transaction log has been generated since the last checkpoint so that crash recovery will take approximately one minute (by default).
This number equates to the generation of many tens of thousands of individual transaction log records. The more data file pages changed by these transaction log records, the larger the amount of I/O that must be performed by database checkpoints.
You can track checkpoints using the “Checkpoint pages/sec” counter in the SQL Server: Buffer Manager performance object. That only gives an aggregate count across all databases on the SQL Server instance. To determine which database is being “checkpointed” at any time, you’ll need to use trace flags.
If you turn on trace flags 3502 (trace print when a checkpoint occurs), 3504 (trace print details about the checkpoint) and 3605 (allow trace prints to go to the error log), you’ll be able to determine which database is accounting for the I/O spikes due to checkpoints.
You can enable these trace flags using the command:
DBCC TRACEON (3502, 3504, 3605, -1)
Disable them again using the command:
DBCC TRACEOFF (3502, 3504, 3605, -1)
Subsequent checkpoints will produce output similar to the following in the error log:
2011-12-30 05:07:14.390 spid17s Ckpt dbid 21 started (8) 2011-12-30 05:07:14.390 spid17s About to log Checkpoint begin. 2011-12-30 05:07:14.390 spid17s Ckpt dbid 21 phase 1 ended (8) 2011-12-30 05:07:14.830 spid17s FlushCache: cleaned up 4307 bufs with 201 writes in 441 ms (avoided 23 new dirty bufs) 2011-12-30 05:07:14.830 spid17s average throughput: 76.30 MB/sec, I/O saturation: 198, context switches 392 2011-12-30 05:07:14.830 spid17s last target outstanding: 15, avgWriteLatency 2 2011-12-30 05:07:14.830 spid17s About to log Checkpoint end. 2011-12-30 05:07:14.830 spid17s Ckpt dbid 21 complete
This lets you see which database is being checkpointed and match that to the information from PerfMon. You can then investigate why there’s so much data being changed between checkpoints, perform more frequent checkpoints to reduce the I/O spike, or increase the capacity of the I/O subsystem.
Q. My company has instituted a new policy requiring that we consolidate as much as possible to reduce hardware costs. I’m also being pushed to reduce the number of SQL Server instances to save licensing costs. Are there any guidelines as to how many databases per SQL Server instance make sense?
A. The answer to this question is a big “it depends.” The list of factors includes the size of the databases, the types of workloads they’re running, the volatility of the data, the type of required regular maintenance, and the disaster recovery and HA requirements.
Each SQL Server instance has a finite amount of space in memory to store data file pages being processed at any time (this is known as the buffer pool). The more databases you have on an instance with disparate workloads all requiring processing, the more competition there will be among the workloads for buffer pool space.
This can lead to thrashing the buffer pool memory. There will be constant churn to make space for new data file pages being read from disk. There will also be large amounts of read I/O with higher-than-acceptable read latencies. All those factors will degrade workload performance.
If the various workloads entail database changes, there will also be write I/O from periodic checkpoints. With numerous databases consolidated on a single instance, there might be multiple checkpoints occurring simultaneously. This could cause write I/O latencies—slowing down the checkpoint operations and further contributing to workload performance degradation.
Regular database maintenance also becomes a problem with large numbers of databases. If each database requires index and statistics maintenance, consistency checking and backups, it can be a challenge to schedule all these operations for all databases so that they don’t conflict with each other and put even more I/O load on the server.
The more databases there are on an instance, the more difficult it becomes to use the native HA technologies of SQL Server to protect them all. It’s more likely you’ll need some kind of I/O subsystem-level replication technology—even just from an ease-of-management perspective. This means additional capital outlay that could offset the cost savings from server consolidation.
Consolidation is a huge topic. Fully doing it justice is beyond the scope of a single column. This is enough food for thought to make you cautious of over-consolidating. On the flip side, you might have many small databases with minimal workloads you could host on a single instance with no problems. As I stated before, it depends.
Paul S. Randal is the managing director of SQLskills.com, a Microsoft regional director and a SQL Server MVP. He worked on the SQL Server Storage Engine team at Microsoft from 1999 to 2007. He wrote DBCC CHECKDB/repair for SQL Server 2005 and was responsible for the Core Storage Engine during SQL Server 2008 development. Randal is an expert on disaster recovery, high availability and database maintenance, and is a regular presenter at conferences worldwide. He blogs at SQLskills.com/blogs/paul, and you can find him on Twitter at Twitter.com/PaulRandal.