SQL Server 2008 Indexing

Article
03/17/2009

Here are some notes on "SQL Server 2008 Indexing" I took while attending an advanced class taught by Kimberly Tripp (https://sqlskills.com/AboutKimberlyLTripp.asp).

Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without the right pre-requisite knowledge and the right context. Reading the post from top to bottom will help.

General

Good database design is necessary
SQL Server does not "automatically take care of everything"
Just throwing hardware at the problem is not a good strategy
Good design will give you orders of magnitude improvements
Clients are commonly doing things servers should be doing

Maintenance plans

In maintenance plans, if you rebuild an index, you can skip reorg or update stats.
If you just go through the wizard, it seems like you need to do all three...
Details at https://www.sqlskills.com/BLOGS/KIMBERLY/post/Database-Maintenance-Best-Practices-Part-I-e28093-clarifying-ambiguous-recommendations-for-Sharepoint.aspx

Indexing

Primary keys are sometimes chosen for wrong reasons
No indexes are automatically created for foreign keys
GUIDs can be evil (as clustering keys, for instance)
Page fragmentation also wastes space in the buffer pool (in-memory cache)
Vertical partitioning can have a huge impact in performance
You should know your average row size and rows per page for a table

Leaf level

Leaf level of indexes contains something for every row of the table
The slot array of the page gives you the order on the page
The order of the pages is maintained by the links between pages
Using SELECT COUNT(*) is OK, you don't need to use SELECT COUNT(1)

Non-leaf level

Non-leaf level of indexes is all about navigation and scalability
Index pages with non-null, non-variable columns = 1 byte of overhead
Index pointers are 6 bytes - 2 for the file number, 4 for the page number
With integer key (4 bytes), you get 622 entries in a non-leaf level page
You can get millions of rows in a table with just a few non-leaf levels
The cost of a lookup is equivalent to the number of levels
SQL needs a query plan you can get to quickly, not the ideal query plan

Non-clustered indexes

In non-clustered indexes for heaps, SQL uses a RID (file+page+slot=8 bytes)
This is twice as the integer at 4 bytes, which could add up to a lot
A well designed clustered index usually is better than a heap.

GUIDs as keys

A GUID key is 16 bytes. You use four times the space of an 4-byte integer
This extra space is wasted in all indexes, foreign keys, etc
GUID keys not only waste space, but also CPU and time to perform lookups
Joins for GUID-based keys are also a lot more expensive than with integers
This has to be in the design. It's hard to change after the fact
It least use a unique integer as clustered key and GUID as non-clustered
GUIDs used everywhere as keys is a growing performance problem
Sequential GUIDs plus page compression helps (somewhat) because of common prefix.
Details at https://sqlkpi.com/BLOGS/KIMBERLY/post/GUIDs-as-PRIMARY-KEYs-andor-the-clustering-key.aspx

Non-clustered keys

Leaf level of a non-clustered index has non-clustered key and primary key (or RID)
Assuming a PK with an integer and FK with SSN, you can fit 449 entries per page
If the non-clustered index is not unique, non-leaf levels have additional ID
So, if possible, have unique non-clustered index and define them as unique
This will possibly reduce the number of non-leaf pages and index levels.

Looking at the pages

Start with a "SELECT * FROM sys.dm_db_index_physical_stats"
DBCC PAGE with non-leaf level pages shows the details, including key info
If you know the root page, you can navigate down to the leaf level using DBCC PAGE

Clustering key selection

Clustering key selection is very important
If you choose a wide clustering key, it has big consequences
If we can get away with a 4-byte key, it makes a big difference
There are other things to consider there, but width is always an issue

You can read other posts on indexing on Kimberly's blog at https://sqlskills.com/BLOGS/KIMBERLY/category/Indexes.aspx

SQL Server 2008 Indexing

Additional resources