Returns the statistics histogram for the specified database object (table or indexed view) in the current SQL Server database. Similar to
DBCC SHOW_STATISTICS WITH HISTOGRAM.
This DMF is available starting with SQL Server 2016 SP1 CU2
sys.dm_db_stats_histogram (object_id, stats_id)
Is the ID of the object in the current database for which properties of one of its statistics is requested. object_id is int.
Is the ID of statistics for the specified object_id. The statistics ID can be obtained from the sys.stats dynamic management view. stats_id is int.
|Column name||Data type||Description|
|object_id||int||ID of the object (table or indexed view) for which to return the properties of the statistics object.|
|stats_id||int||ID of the statistics object. Is unique within the table or indexed view. For more information, see sys.stats (Transact-SQL).|
|step_number||int||The number of step in the histogram.|
|range_high_key||sql_variant||Upper bound column value for a histogram step. The column value is also called a key value.|
|range_rows||real||Estimated number of rows whose column value falls within a histogram step, excluding the upper bound.|
|equal_rows||real||Estimated number of rows whose column value equals the upper bound of the histogram step.|
|distict_range_rows||bigint||Estimated number of rows with a distinct column value within a histogram step, excluding the upper bound.|
|average_range_rows||real||Average number of rows with duplicate column values within a histogram step, excluding the upper bound (
The resultset for
sys.dm_db_stats_histogram returns information similar to
DBCC SHOW_STATISTICS WITH HISTOGRAM and also includes
Because the column
range_high_key is a sql_variant data type, you may need to use
CONVERT if a predicate does comparison with a non-string constant.
A histogram measures the frequency of occurrence for each distinct value in a data set. The query optimizer computes a histogram on the column values in the first key column of the statistics object, selecting the column values by statistically sampling the rows or by performing a full scan of all rows in the table or view. If the histogram is created from a sampled set of rows, the stored totals for number of rows and number of distinct values are estimates and do not need to be whole integers.
To create the histogram, the query optimizer sorts the column values, computes the number of values that match each distinct column value and then aggregates the column values into a maximum of 200 contiguous histogram steps. Each step includes a range of column values followed by an upper bound column value. The range includes all possible column values between boundary values, excluding the boundary values themselves. The lowest of the sorted column values is the upper boundary value for the first histogram step.
The following diagram shows a histogram with six steps. The area to the left of the first upper boundary value is the first step.
For each histogram step:
Bold line represents the upper boundary value (RANGE_HI_KEY) and the number of times it occurs (EQ_ROWS)
Solid area left of RANGE_HI_KEY represents the range of column values and the average number of times each column value occurs (AVG_RANGE_ROWS). The AVG_RANGE_ROWS for the first histogram step is always 0.
Dotted lines represent the sampled values used to estimate total number of distinct values in the range (DISTINCT_RANGE_ROWS) and total number of values in the range (RANGE_ROWS). The query optimizer uses RANGE_ROWS and DISTINCT_RANGE_ROWS to compute AVG_RANGE_ROWS and does not store the sampled values.
The query optimizer defines the histogram steps according to their statistical significance. It uses a maximum difference algorithm to minimize the number of steps in the histogram while maximizing the difference between the boundary values. The maximum number of steps is 200. The number of histogram steps can be fewer than the number of distinct values, even for columns with fewer than 200 boundary points. For example, a column with 100 distinct values can have a histogram with fewer than 100 boundary points.
Requires that the user has select permissions on statistics columns or the user owns the table or the user is a member of the
sysadmin fixed server role, the
db_owner fixed database role, or the
db_ddladmin fixed database role.
A. Simple example
The following example creates and populates a simple table. Then creates statistics on the
CREATE TABLE Country (Country_ID int IDENTITY PRIMARY KEY, Country_Name varchar(120) NOT NULL); INSERT Country (Country_Name) VALUES ('Canada'), ('Denmark'), ('Iceland'), ('Peru'); CREATE STATISTICS Country_Stats ON Country (Country_Name) ;
The primary key occupies
stat_id number 1, so call
stat_id number 2, to return the statistics histogram for the
SELECT * FROM sys.dm_db_stats_histogram(OBJECT_ID('Country'), 2);
B. Useful query:
SELECT hist.step_number, hist.range_high_key, hist.range_rows, hist.equal_rows, hist.distinct_range_rows, hist.average_range_rows FROM sys.stats AS s CROSS APPLY sys.dm_db_stats_histogram(s.[object_id], s.stats_id) AS hist WHERE s.[name] = N'<statistic_name>';
C. Useful query:
The following example selects from table
Country with a predicate on column
SELECT * FROM Country WHERE Country_Name = 'Canada';
The following example looks at the previously created statistic on table
Country and column
Country_Name for the histogram step matching the predicate in the query above.
SELECT ss.name, ss.stats_id, shr.steps, shr.rows, shr.rows_sampled, shr.modification_counter, shr.last_updated, sh.range_rows, sh.equal_rows FROM sys.stats ss INNER JOIN sys.stats_columns sc ON ss.stats_id = sc.stats_id AND ss.object_id = sc.object_id INNER JOIN sys.all_columns ac ON ac.column_id = sc.column_id AND ac.object_id = sc.object_id CROSS APPLY sys.dm_db_stats_properties(ss.object_id, ss.stats_id) shr CROSS APPLY sys.dm_db_stats_histogram(ss.object_id, ss.stats_id) sh WHERE ss.[object_id] = OBJECT_ID('Country') AND ac.name = 'Country_Name' AND sh.range_high_key = CAST('Canada' AS CHAR(8));