SQL Server 2008 Indexing Best Practices

Here are some additional notes on "SQL Server 2008 Indexing Best Practices" I took while attending an advanced class on SQL Server taught by Kimberly Tripp  (https://sqlskills.com/AboutKimberlyLTripp.asp).

Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without the right pre-requisite knowledge and the right context. Reading the post from top to bottom will help.


The best index 

  • There's a best index for every query
  • You don't want to do that for every query, though
  • You would end up with too many indexes
  • You need to select a good set of base indexes

Find a balance

  • You need to find a balance between no indexes and one ideal index on every query
  • Do not put a non-clustered index on every columns :-)
  • You can find (mathematically) if a non-clustered index will help or not
  • You can sometimes replace a narrow non-clustered index with a wider one that is more useful

Strategies

  • Create constraints for primary keys and alternate/candidate keys
  • Add indexes to foreign key constraints. It can help some joins
  • Capture workloads and use the database tuning advisor (DTA)
  • Add indexes to help with specific queries using joins, aggregations, etc.
  • Missing index DMVs query https://blogs.msdn.com/bartd/archive/2007/07/19/are-you-using-sql-s-missing-index-dmvs.aspx
  • These are only strategies you should consider, not absolutes.
  • Also, you need to maintain the indexes over time. Things change over time.

OLTP x OLAP 

  • OLTP tables typically uses fewer indexes than OLAP tables
  • OLTP tables can make good use of indexes
  • However, there are certain indexes on OLTP that will be bad

Clustered index keys

  • Clustered Index keys: Unique, narrow, static, ever increasing
  • Examples: Identity / Date,Identity / GUID (with NEWSEQUENTIALID function)
  • Add something like identity to avoid ending up with uniquefiers
  • This means that you will have a lot less management over time
  • If you can have a really good clustering key, avoid heaps.

Identity 

  • Identity is naturally unique, static, narrow, hot spot for inserts
  • Being a hot spot in this case is actually a good thing
  • It minimizes cache requirements, naturally needs less maintenance
  • Could have contention in GAM/SGAM pages if you have lots of tables with identity
  • Can overcome with multiple files, see https://support.microsoft.com/kb/328551

Lookups vs. Table Scans 

  • Lookups. Physical vs. Logical I/Os
  • Potential for physical I/Os is in the leaf level (non-leaf are usually cached)
  • SQL knows how many pages are on the leaf level, how many rows per page
  • SQL needs to make a decision on Lookup x Table Scan
  • If it's in the WHERE clause, it doesn't mean an index on that will be used
  • Estimating query cost: tipping point for a query to go TS over a lookup
  • Won't spoil the results, but it is smaller than most would think...
  • Details at https://sqlskills.com/BLOGS/KIMBERLY/post/Tipping-Point-Queries-More-Questions-To-Really-Test-You!.aspx

Hints

  • You can force SQL to do Lookup x TS, but SQL usually does a good estimate
  • There are cases when the statistics for the table are not up to date
  • People end up using more hints (like INDEX, FORCESEEK) or plan guides than they should
  • If you use a hint and you're wrong, SQL will follow your hint anyway

Coverage

  • The power of covering. Leaf level of the non-clustered index has all the rows
  • People use indexes with "LN,FN,MI including SSN" and "LN, FN, MI including PHONE" and "LN with SSN", etc..
  • You can cover the same with a single index on "LN, FN, MI including SSN, PHONE"
  • Order of included columns do not matter. Just the order of the keys
  • Could this be happening because of the missing index message in query results?
  • Multiple developers not in sync can be a common source if this kind of thing.

Included columns

  • INCLUDE option lets you separate what goes in the leaf and non-leaf of index
  • There is no limit to what you can INCLUDE, but be careful with the duplication
  • You can create a non-clustered, covering, seekable index just for a query
  • For instance, you could create a non-clustered index on the SSN key and a few INCLUDED columns just to help a specific set of queries without hitting data pages at all
  • Full scan on a small non-clustered index could even beat seeks in certain cases.

 Filtered Indexes

  • SQL Server 2008 includes the ability to do Filtered Indexes
  • For instance, filter for "status=1" to index only active items
  • For instance, filter for orderdate in last year for certain reports
  • Now that the index is much smaller, you can now maybe consider including more columns
  • This will give you more coverage without using a lot of space
  • Filtered indexes statistics are usually more accurate, if they cover fewer rows
  • This is not a substitute for partitions. Don't create filtered indexes if all sets are useful
  • Kimberly's SP_HELPINDEX2 shows included columns and filtered indexes
    https://www.sqlskills.com/blogs/Kimberly/post/Updates-(fixes)-to-sp_helpindex2.aspx

Sparse Columns 

  • Combine with sparse columns for indexes on subsets with only certain columns INCLUDED
  • Beware: Create indexes on tables with sparse columns might materialize sparse columns.

Conclusion

  • The best case: index covers all you need (key+included), nothing you don't and is seekable
  • Indexes do use more space and come at a cost for inserts, updates, maintenance
  • You don't have to do this for every query or every table
  • Put effort in the top few queries and you'll fix most of your issues
  • There are usually a few situations that cause you most of the grief
  • Look at that stored procedure that is executed thousand times per hour
  • You can overdo it. Be careful

Kimberly has published a related presentation at https://sqlskills.com/resources/conferences/Tripp_SPR302_PreconferenceSession_UPDATED_VERSION.pdf