statistics question

Question

hi folks,
A question related to statistics, in one of the post online i read that sample_pct plays big role in how accurately statistics are utilized. When i run following query in my environment, i get almost all the statistics sample_pct close to 1.0 , i dont see any sample_pct beyond 1.0. Is this true that this number should be close to 100? I'm using ola hellengren's script with the parameters mentioned below.

SELECT OBJECT_SCHEMA_NAME(st.object_id) + '.' + OBJECT_NAME(st.object_id) AS TableName
, col.name AS ColumnName
, st.name AS StatsName
, sp.last_updated
, sp.rows_sampled
, sp.rows
, (1.0*sp.rows_sampled)/(1.0*sp.rows) AS sample_pct
FROM sys.stats st
INNER JOIN sys.stats_columns st_col
ON st.object_id = st_col.object_id
AND st.stats_id = st_col.stats_id
INNER JOIN sys.columns col
ON st_col.object_id = col.object_id
AND st_col.column_id = col.column_id
CROSS APPLY sys.dm_db_stats_properties (st.object_id, st.stats_id) sp
ORDER BY 7

--
Ola hellengren sp.

EXECUTE dbo.IndexOptimize
@Databases = 'DB',
@FragmentationLow = NULL,
@FragmentationMedium = 'INDEX_REORGANIZE,INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE',
@FragmentationHigh = 'INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE', -- keep this offline
@FragmentationLevel1 = 5,
@FragmentationLevel2 = 30,
@UpdateStatistics = 'ALL', -- remove this
@OnlyModifiedStatistics = 'N',
@StatisticsSample=100,
@LogToTable ='Y'

Answer

To get a number close to 100, change

, (1.0*sp.rows_sampled)/(1.0*sp.rows) AS sample_pct

to

, (100.0*sp.rows_sampled)/(1.0*sp.rows) AS sample_pct

That is, you query computes fractions but calls the column pct which is misleading.

As for how important it is to have fullscan statistics (as a sample rate of 100% is referred to), there is no discussion that sampled statistics can sometimes result in less optimal plans. This is particularly true if you query for a condition at the edges. To take a simple example, consider a query like SELECT * FROM tbl WHERE col IS NULL. The table has a sizable number of rows where col indeed is NULL, but when statistics were sampled - and for a big table that may actually be around 1% of the rows - no NULL were found, and the optimizer says "I guess 1 row!" when in fact there were 98234 rows. This can lead to a plan which is not good at all if you are joining to other table. (Since NULL values sort first, they are the edge, so to speak.)

The problem with fullscan statistics is that they are costly to produce for big tables. If you are rebuilding the index, you get the fullscan for free. It can also be fairly cheap to get fullscan statistics for non-clustered indexes, since they are smaller than the base table. But if you say

UPDATE STATISTICS WideTable WITH FULLSCAN

And that table has hundreds of millions of columns, and lots non of non-indexed columns, this can take many hours as each non-indexed column requires a full scan of the table.

Answer

Hi，@Heisenberg

Welcome to Microsoft T-SQL Q&A Forum!

When SQL Server creates or updates statistics and the sample rate is not manually specified , SQL Server calculates the default sample rate.
But sometimes the default sampling rate may not accurately represent the data distribution , in this case you need to manually update the statistics, which is also the content of Ola hellengren's sp implementation.

You just need to do this:

 (rows_sampled * 100)/(1.0*rows) AS sample_percent  
FROM sys.stats ss  
INNER JOIN sys.stats_columns sc   
ON ss.stats_id = sc.stats_id AND ss.object_id = sc.object_id  
INNER JOIN sys.all_columns ac   
ON ac.column_id = sc.column_id AND ac.object_id = sc.object_id  
CROSS APPLY sys.dm_db_stats_properties(ss.object_id, ss.stats_id) shr

For details, please refer to the official documentation , which has a detailed case description for this issue.

Best regards,
Bert Zhou

If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
Note: Please follow the steps in our Documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

statistics question

2 answers