Statistics Sample Rates
I had a question about sample rates for statistics yesterday for SQL Server 200x statistics. Here is my basic rule-of-thumb.
1. If you have no issues with statistics or query plan quality, then you can leave it alone and not think about it.
2. If you have data with very "spikey" distributions that is not caught by the default sample rate, then you should consider a higher sample rate.
The core reason for this is that the sampling algorithm for SQL Server is not entirely "random". First, it samples pages and then uses all rows on the page. Second, it will actually sample the same pages each time, mostly to retain sanity within the test team here at Microsoft. So, it is possible that the default sample rate will sample pages that do not contain all of the interesting rows that define the "spikes" in your data distribution. A higher sample rate can capture more of these rows, although it will be at a higher cost. the async statistics update feature in SQL 2005 can help reduce the perception of that cost for some applications by moving autostats updates to a background job queue.