SQL Server: Databases and Indexes
Managing your indexes, and having the proper indexes in place, is an essential part of managing your overall SQL Server workload.
Excerpted from “SQL Server DMV Starter Pack,” published by Red Gate Books (2010).
Glenn Berry, Louis Davidson and Tim Ford
Microsoft logically divides database management objects (DMOs) at the database/file level into two categories:
- Database-related. These contain database management views (DMVs) that let us investigate table and index page and row counts for a given database, as well as page allocation at the file level. A couple of DMVs are also dedicated to investigating TempDB database usage.
- Index-related. These contain DMVs specifically related to indexes, their characteristics, how they’re used and to help identify potentially useful indexes for your workload.
All the views in these two categories have “sys.dm_db_” at the beginning of their labels. These types of DMVs can help you define an effective indexing strategy, as this is one of the best ways to ensure the most significant and frequent queries are able to read the data they require in a logical, ordered fashion, and so avoid unnecessary I/O. Finding the correct balance between too many indexes and too few indexes—and having the “proper” set of indexes in place—is extremely important for getting the best performance out of SQL Server.
You’ll also need to monitor the TempDB database. TempDB is a global resource that stores temporary data for user and internal objects for all users connected to a given SQL Server instance. This includes, for example, internal worktables used to store results from cursors, and user objects such as temporary tables and table variables.
Find Missing Indexes
In order to find out which indexes are potentially missing from a given database, you need to use three closely related DMVs. The first one is sys.dm_db_missing_index_group_stats, which is described as follows:
“Returns summary information about groups of missing indexes, excluding spatial indexes. Information returned by sys.dm_db_missing_index_group_stats is updated by every query execution, not by every query compilation or recompilation. Usage statistics are not persisted and are kept only until SQL Server is restarted. Database administrators should periodically make backup copies of the missing index information if they want to keep the usage statistics after server recycling.”
The second DMV is sys.dm_db_missing_index_groups, described as follows:
“Returns information about what missing indexes are contained in a specific missing index group, excluding spatial indexes.”
This is basically just a join table between sys.dm_db_missing_index_group_stats and our third DMV, which is sys.dm_db_missing_index_details, described like this:
“Returns detailed information about missing indexes, excluding spatial indexes.”
By joining these three DMVs, you get a useful missing index query (see Figure 1).
Figure 1 Identifying potentially useful indexes.
-- Missing Indexes in current database by Index Advantage SELECT user_seeks * avg_total_user_cost * ( avg_user_impact * 0.01 ) AS [index_advantage] , migs.last_user_seek , mid.[statement] AS [Database.Schema.Table] , mid.equality_columns , mid.inequality_columns , mid.included_columns , migs.unique_compiles , migs.user_seeks , migs.avg_total_user_cost , migs.avg_user_impact FROM sys.dm_db_missing_index_group_stats AS migs WITH ( NOLOCK ) INNER JOIN sys.dm_db_missing_index_groups AS mig WITH ( NOLOCK ) ON migs.group_handle = mig.index_group_handle INNER JOIN sys.dm_db_missing_index_details AS mid WITH ( NOLOCK ) ON mig.index_handle = mid.index_handle WHERE mid.database_id = DB_ID() ORDER BY index_advantage DESC ;
This query uses the data-access pattern statistics for a particular table to calculate the possible advantage of adding a certain index. Indexes with a higher index_advantage are those that SQL Server considers will have the biggest positive impact on reducing workload, based on reducing query cost and the projected number of times they will use the index.
Bear in mind that if you make a change to an index for a given table, then all the missing index statistics for that table are cleared out and repopulated again over time. If you run this query shortly after an index change, it will probably—and inaccurately—inform you that there are no missing indexes for this table.
It does have some limitations you’ll need to consider. First, it doesn’t always specify the best column order for an index. If there are multiple columns listed under equality_columns or inequality_columns, you’ll want to look at the selectivity of each of those columns within the equality and inequality results to determine the best column order for the prospective new index. Second, it doesn’t consider filtered indexes, which are new for SQL Server 2008. Finally, it’s eager to suggest included columns and new indexes in general.
You should never just blindly add every index that this query suggests, especially if you have an online transaction processing (OLTP) workload. Instead, you need to examine the results of the query carefully and manually filter out results that are not part of your regular workload.
Start by examining the last_user_seek column. If the last_user_seek time is a few days or even weeks ago, then the queries that caused SQL Server to want that index are probably from a random, ad hoc query or part of an infrequently run report query. On the other hand, if the last_user_seek time was a few seconds or a few minutes ago, then it’s probably part of your regular workload and you should consider that possible index more carefully.
Regardless of what this query recommends, always look at the existing indexes on a table, including their usage statistics, before making any changes. Remember, a more volatile table should generally have fewer indexes than a more static table. You should be very hesitant to add a new index to a table (for an OLTP workload) if the table already has more than about five or six effective indexes.
Don’t forget that the system stored procedure, sp_helpindex, does not show included column information. This means you should either use a replacement or simply script out the CREATE INDEX statement for your existing indexes.
Interrogate Index Usage
One of the most useful DMVs in the Indexing category is sys.dm_db_index_usage_stats, which is described as follows:
“Returns counts of different types of index operations and the time each type of operation was last performed. Every individual seek, scan, lookup or update on the specified index by one query execution is counted as a use of that index and increments the corresponding counter in this view. Information is reported both for operations caused by user-submitted queries and for operations caused by internally generated queries, such as scans for gathering statistics.”
This DMV provides invaluable information regarding if and how often your indexes are being used, for both reads and writes. You can also interrogate this DMV to provide information on:
- The distribution of your workload across your defined indexes
- Indexes that are not accessed by your workload, and so are prime candidates for deletion
- Indexes with a large number of writes and zero, or few reads (these are also candidates for removal, after further investigation).
The first of these three scripts (see Figure 2) will list all of your heap tables, clustered indexes and non-clustered indexes, along with the number of reads, writes and the fill factor for each index.
Figure 2 You can determine how your indexes are being used.
--- Index Read/Write stats (all tables in current DB) SELECT OBJECT_NAME(s.[object_id]) AS [ObjectName] , i.name AS [IndexName] , i.index_id , user_seeks + user_scans + user_lookups AS [Reads] , user_updates AS [Writes] , i.type_desc AS [IndexType] , i.fill_factor AS [FillFactor] FROM sys.dm_db_index_usage_stats AS s INNER JOIN sys.indexes AS i ON s.[object_id] = i.[object_id] WHERE OBJECTPROPERTY(s.[object_id], ‘IsUserTable’) = 1 AND i.index_id = s.index_id AND s.database_id = DB_ID() ORDER BY OBJECT_NAME(s.[object_id]) , writes DESC , reads DESC ;
This is a useful query for better understanding your workload. It can help you determine the volatility of a particular index, and the ratio of reads to writes. This can help you refine and tune your indexing strategy. For example, if you had a table that was fairly static (very few writes on any of the indexes), you could feel more confident about adding more indexes listed in your missing index queries.
If you have SQL Server 2008 Enterprise Edition, this query could help you decide whether it would be a good idea to enable data compression (either Page or Row). An index with very little write activity is likely to be a better candidate for data compression than a more volatile index.
The next script (see Figure 3) uses sys.indexes and sys.objects to find tables and indexes in the current database that do not show up in sys.dm_db_index_usage_stats. This means these indexes have had no reads or writes since SQL Server was last started or since the current database was closed or detached (whichever is shorter).
Figure 3 Find unused indexes.
-- List unused indexes SELECT OBJECT_NAME(i.[object_id]) AS [Table Name] , i.name FROM sys.indexes AS i INNER JOIN sys.objects AS o ON i.[object_id] = o.[object_id] WHERE i.index_id NOT IN ( SELECT s.index_id FROM sys.dm_db_index_usage_stats AS s WHERE s.[object_id] = i.[object_id] AND i.index_id = s.index_id AND database_id = DB_ID() ) AND o.[type] = ‘U’ ORDER BY OBJECT_NAME(i.[object_id]) ASC ;
If SQL Server has been running long enough that you have a complete, representative workload, there’s a good chance these unused indexes (and perhaps tables) are “dead.” This means your database no longer uses them and you can potentially drop them, after doing some further investigation.
Our final sys.dm_db_index_usage_stats query filters by the current database (see Figure 4). This only includes non-clustered indexes. It can help you decide whether the cost of maintaining a particular index outweighs the benefit of having it in place.
Figure 4 Finding rarely used indexes.
-- Possible Bad NC Indexes (writes > reads) SELECT OBJECT_NAME(s.[object_id]) AS [Table Name] , i.name AS [Index Name] , i.index_id , user_updates AS [Total Writes] , user_seeks + user_scans + user_lookups AS [Total Reads] , user_updates - ( user_seeks + user_scans + user_lookups ) AS [Difference] FROM sys.dm_db_index_usage_stats AS s WITH ( NOLOCK ) INNER JOIN sys.indexes AS i WITH ( NOLOCK ) ON s.[object_id] = i.[object_id] AND i.index_id = s.index_id WHERE OBJECTPROPERTY(s.[object_id], ‘IsUserTable’) = 1 AND s.database_id = DB_ID() AND user_updates > ( user_seeks + user_scans + user_lookups ) AND i.index_id > 1 ORDER BY [Difference] DESC , [Total Writes] DESC , [Total Reads] ASC ;
This query looks for any indexes that have large numbers of writes with zero reads. Any index that falls into this category is a good candidate for deletion (upon full investigation). You want to make sure that your SQL Server instance has been running long enough that you have your complete, typical workload included.
Don’t forget about periodic reporting workloads that might not show up in your day-to-day workload. Even though the indexes that facilitate such workloads won’t be that frequently used, their presence will be critical.
You should also look at rows where there are large numbers of writes and a small number of reads. Dropping these indexes will be more of a judgment call, depending on the table and how familiar you are with your workload.
Glenn Berry works as a database architect at NewsGator Technologies Inc. in Denver. He’s a SQL Server MVP and has a whole collection of Microsoft certifications, including MCITP, MCDBA, MCSE, MCSD, MCAD and MCTS, which proves that he likes to take tests.
Louis Davidson has been in the IT industry for 16 years as a corporate database developer and architect. He has been a SQL Server Microsoft MVP for six years and has written four books on database design. Currently he’s the data architect and sometimes DBA for the Christian Broadcasting Network, supporting offices in Virginia Beach, Va., and Nashville, Tenn.
Timothy Ford is a SQL Server MVP and has been working with SQL Server for more than 10 years. He’s the primary DBA and subject-matter expert for the SQL Server platform for Spectrum Health. He’s been writing about technology since 2007 for a variety of Web sites and maintains his own blog at thesqlagentman.com, covering SQL as well as telecommuting and professional development topics.**
Learn more about “SQL Server DMV Starter Pack” at red-gate.com/our-company/about/book-store.