SQL Server 2008 Indexing

Here are some notes on "SQL Server 2008 Indexing" I took while attending an advanced class taught by Kimberly Tripp (https://sqlskills.com/AboutKimberlyLTripp.asp).

Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without the right pre-requisite knowledge and the right context. Reading the post from top to bottom will help.


General 

  • Good database design is necessary
  • SQL Server does not "automatically take care of everything"
  • Just throwing hardware at the problem is not a good strategy
  • Good design will give you orders of magnitude improvements
  • Clients are commonly doing things servers should be doing

Maintenance plans 

Indexing

  • Primary keys are sometimes chosen for wrong reasons
  • No indexes are automatically created for foreign keys
  • GUIDs can be evil (as clustering keys, for instance)
  • Page fragmentation also wastes space in the buffer pool (in-memory cache)
  • Vertical partitioning can have a huge impact in performance
  • You should know your average row size and rows per page for a table

Leaf level 

  • Leaf level of indexes contains something for every row of the table
  • The slot array of the page gives you the order on the page
  • The order of the pages is maintained by the links between pages
  • Using SELECT COUNT(*) is OK, you don't need to use SELECT COUNT(1)

Non-leaf level 

  • Non-leaf level of indexes is all about navigation and scalability
  • Index pages with non-null, non-variable columns = 1 byte of overhead
  • Index pointers are 6 bytes - 2 for the file number, 4 for the page number
  • With integer key (4 bytes), you get 622 entries in a non-leaf level page
  • You can get millions of rows in a table with just a few non-leaf levels
  • The cost of a lookup is equivalent to the number of levels
  • SQL needs a query plan you can get to quickly, not the ideal query plan

Non-clustered indexes 

  • In non-clustered indexes for heaps, SQL uses a RID (file+page+slot=8 bytes)
  • This is twice as the integer at 4 bytes, which could add up to a lot
  • A well designed clustered index usually is better than a heap.

GUIDs as keys 

  • A GUID key is 16 bytes. You use four times the space of an 4-byte integer
  • This extra space is wasted in all indexes, foreign keys, etc
  • GUID keys not only waste space, but also CPU and time to perform lookups
  • Joins for GUID-based keys are also a lot more expensive than with integers
  • This has to be in the design. It's hard to change after the fact
  • It least use a unique integer as clustered key and  GUID as non-clustered
  • GUIDs used everywhere as keys is a growing performance problem
  • Sequential GUIDs plus page compression helps (somewhat) because of common prefix.
  • Details at https://sqlkpi.com/BLOGS/KIMBERLY/post/GUIDs-as-PRIMARY-KEYs-andor-the-clustering-key.aspx

Non-clustered keys

  • Leaf level of a non-clustered index has non-clustered key and primary key (or RID)
  • Assuming a PK with an integer and FK with SSN, you can fit 449 entries per page
  • If the non-clustered index is not unique, non-leaf levels have additional ID
  • So, if possible, have unique non-clustered index and define them as unique
  • This will possibly reduce the number of non-leaf pages and index levels.

Looking at the pages 

  • Start with a "SELECT * FROM sys.dm_db_index_physical_stats"
  • DBCC PAGE with non-leaf level pages shows the details, including key info
  • If you know the root page, you can navigate down to the leaf level using DBCC PAGE

Clustering key selection 

  • Clustering key selection is very important
  • If you choose a wide clustering key, it has big consequences
  • If we can get away with a 4-byte key, it makes a big difference
  • There are other things to consider there, but width is always an issue

You can read other posts on indexing on Kimberly's blog at https://sqlskills.com/BLOGS/KIMBERLY/category/Indexes.aspx