SQL Server 2008 Indexing Best Practices
Here are some additional notes on "SQL Server 2008 Indexing Best Practices" I took while attending an advanced class on SQL Server taught by Kimberly Tripp (http://sqlskills.com/AboutKimberlyLTripp.asp).
Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without the right pre-requisite knowledge and the right context. Reading the post from top to bottom will help.
The best index
- There's a best index for every query
- You don't want to do that for every query, though
- You would end up with too many indexes
- You need to select a good set of base indexes
Find a balance
- You need to find a balance between no indexes and one ideal index on every query
- Do not put a non-clustered index on every columns :-)
- You can find (mathematically) if a non-clustered index will help or not
- You can sometimes replace a narrow non-clustered index with a wider one that is more useful
- Create constraints for primary keys and alternate/candidate keys
- Add indexes to foreign key constraints. It can help some joins
- Capture workloads and use the database tuning advisor (DTA)
- Add indexes to help with specific queries using joins, aggregations, etc.
- Missing index DMVs query http://blogs.msdn.com/bartd/archive/2007/07/19/are-you-using-sql-s-missing-index-dmvs.aspx
- These are only strategies you should consider, not absolutes.
- Also, you need to maintain the indexes over time. Things change over time.
OLTP x OLAP
- OLTP tables typically uses fewer indexes than OLAP tables
- OLTP tables can make good use of indexes
- However, there are certain indexes on OLTP that will be bad
Clustered index keys
- Clustered Index keys: Unique, narrow, static, ever increasing
- Examples: Identity / Date,Identity / GUID (with NEWSEQUENTIALID function)
- Add something like identity to avoid ending up with uniquefiers
- This means that you will have a lot less management over time
- If you can have a really good clustering key, avoid heaps.
- Identity is naturally unique, static, narrow, hot spot for inserts
- Being a hot spot in this case is actually a good thing
- It minimizes cache requirements, naturally needs less maintenance
- Could have contention in GAM/SGAM pages if you have lots of tables with identity
- Can overcome with multiple files, see http://support.microsoft.com/kb/328551
Lookups vs. Table Scans
- Lookups. Physical vs. Logical I/Os
- Potential for physical I/Os is in the leaf level (non-leaf are usually cached)
- SQL knows how many pages are on the leaf level, how many rows per page
- SQL needs to make a decision on Lookup x Table Scan
- If it's in the WHERE clause, it doesn't mean an index on that will be used
- Estimating query cost: tipping point for a query to go TS over a lookup
- Won't spoil the results, but it is smaller than most would think...
- Details at http://sqlskills.com/BLOGS/KIMBERLY/post/Tipping-Point-Queries-More-Questions-To-Really-Test-You!.aspx
- You can force SQL to do Lookup x TS, but SQL usually does a good estimate
- There are cases when the statistics for the table are not up to date
- People end up using more hints (like INDEX, FORCESEEK) or plan guides than they should
- If you use a hint and you're wrong, SQL will follow your hint anyway
- The power of covering. Leaf level of the non-clustered index has all the rows
- People use indexes with "LN,FN,MI including SSN" and "LN, FN, MI including PHONE" and "LN with SSN", etc..
- You can cover the same with a single index on "LN, FN, MI including SSN, PHONE"
- Order of included columns do not matter. Just the order of the keys
- Could this be happening because of the missing index message in query results?
- Multiple developers not in sync can be a common source if this kind of thing.
- INCLUDE option lets you separate what goes in the leaf and non-leaf of index
- There is no limit to what you can INCLUDE, but be careful with the duplication
- You can create a non-clustered, covering, seekable index just for a query
- For instance, you could create a non-clustered index on the SSN key and a few INCLUDED columns just to help a specific set of queries without hitting data pages at all
- Full scan on a small non-clustered index could even beat seeks in certain cases.
- SQL Server 2008 includes the ability to do Filtered Indexes
- For instance, filter for "status=1" to index only active items
- For instance, filter for orderdate in last year for certain reports
- Now that the index is much smaller, you can now maybe consider including more columns
- This will give you more coverage without using a lot of space
- Filtered indexes statistics are usually more accurate, if they cover fewer rows
- This is not a substitute for partitions. Don't create filtered indexes if all sets are useful
- Kimberly's SP_HELPINDEX2 shows included columns and filtered indexes
- Combine with sparse columns for indexes on subsets with only certain columns INCLUDED
- Beware: Create indexes on tables with sparse columns might materialize sparse columns.
- The best case: index covers all you need (key+included), nothing you don't and is seekable
- Indexes do use more space and come at a cost for inserts, updates, maintenance
- You don't have to do this for every query or every table
- Put effort in the top few queries and you'll fix most of your issues
- There are usually a few situations that cause you most of the grief
- Look at that stored procedure that is executed thousand times per hour
- You can overdo it. Be careful
Kimberly has published a related presentation at http://sqlskills.com/resources/conferences/Tripp_SPR302_PreconferenceSession_UPDATED_VERSION.pdf