Finding and Tuning Similar Queries by Using Query and Query Plan Hashes

When searching for resource-intensive queries, you should consider how to find and tune similar queries that collectively consume significant system resources. The sys.dm_exec_query_stats and sys.dm_exec_requests dynamic management views provide query hash and query plan hash values that you can use to help determine the aggregate resource usage for similar queries and similar query execution plans.

This topic defines the query hash and query plan hash, gives examples of using the hash values to find the cumulative cost of similar queries and execution plans, and offers ways to enhance performance for similar queries and execution plans.

Understanding the Query Hash and Query Plan Hash

The query hash is a binary hash value calculated on the query and used to identify queries with similar logic. The query optimizer computes the query hash during query compilation. Queries that differ only by literal values have the same query hash. For example, the following two queries have the same query hash because they differ only by the literal values assigned to FirstName and LastName.

USE AdventureWorks;
GO
SELECT I.CustomerID, C.FirstName, C.LastName, A.AddressLine1, A.City 
FROM Person.Contact AS C
    JOIN Sales.Individual AS I ON C.ContactID = I.ContactID
    JOIN Sales.CustomerAddress AS CA ON CA.CustomerID = I.CustomerID
    JOIN Person.Address AS A ON A.AddressID = CA.AddressID
WHERE C.FirstName = 'Amanda' AND C.LastName = 'Allen';
GO
SELECT I.CustomerID, C.FirstName, C.LastName, A.AddressLine1, A.City 
FROM Person.Contact AS C
    JOIN Sales.Individual AS I ON C.ContactID = I.ContactID
    JOIN Sales.CustomerAddress AS CA ON CA.CustomerID = I.CustomerID
    JOIN Person.Address AS A ON A.AddressID = CA.AddressID
WHERE C.FirstName = 'Logan' AND C.LastName = 'Jones';
GO
--Show the query hash for both queries.
SELECT st.text AS "Query Text", qs.query_hash AS "Query Hash"
FROM sys.dm_exec_query_stats qs 
CROSS APPLY sys.dm_exec_sql_text (qs.sql_handle) st
    WHERE st.text = 'SELECT I.CustomerID, C.FirstName, C.LastName, A.AddressLine1, A.City 
FROM Person.Contact AS C
    JOIN Sales.Individual AS I ON C.ContactID = I.ContactID
    JOIN Sales.CustomerAddress AS CA ON CA.CustomerID = I.CustomerID
    JOIN Person.Address AS A ON A.AddressID = CA.AddressID
WHERE C.FirstName = ''Amanda'' AND C.LastName = ''Allen'';
' OR st.text = 'SELECT I.CustomerID, C.FirstName, C.LastName, A.AddressLine1, A.City 
FROM Person.Contact AS C
    JOIN Sales.Individual AS I ON C.ContactID = I.ContactID
    JOIN Sales.CustomerAddress AS CA ON CA.CustomerID = I.CustomerID
    JOIN Person.Address AS A ON A.AddressID = CA.AddressID
WHERE C.FirstName = ''Logan'' AND C.LastName = ''Jones'';
';
GO

The following two queries have different query hashes because their differences are logical (AND versus OR) and not limited to only literals.

USE AdventureWorks;
GO
SELECT I.CustomerID, C.FirstName, C.LastName, A.AddressLine1, A.City 
FROM Person.Contact AS C
    JOIN Sales.Individual AS I ON C.ContactID = I.ContactID
    JOIN Sales.CustomerAddress AS CA ON CA.CustomerID = I.CustomerID
    JOIN Person.Address AS A ON A.AddressID = CA.AddressID
WHERE C.FirstName = 'Amanda' AND C.LastName = 'Allen';
GO
SELECT I.CustomerID, C.FirstName, C.LastName, A.AddressLine1, A.City 
FROM Person.Contact AS C
    JOIN Sales.Individual AS I ON C.ContactID = I.ContactID
    JOIN Sales.CustomerAddress AS CA ON CA.CustomerID = I.CustomerID
    JOIN Person.Address AS A ON A.AddressID = CA.AddressID
WHERE C.FirstName = 'Logan' OR C.LastName = 'Jones';
GO
--Show the query hash for both queries.
SELECT st.text AS "Query Text", qs.query_hash AS "Query Hash"
FROM sys.dm_exec_query_stats qs 
CROSS APPLY sys.dm_exec_sql_text (qs.sql_handle) st
    WHERE st.text = 'SELECT I.CustomerID, C.FirstName, C.LastName, A.AddressLine1, A.City 
FROM Person.Contact AS C
    JOIN Sales.Individual AS I ON C.ContactID = I.ContactID
    JOIN Sales.CustomerAddress AS CA ON CA.CustomerID = I.CustomerID
    JOIN Person.Address AS A ON A.AddressID = CA.AddressID
WHERE C.FirstName = ''Amanda'' AND C.LastName = ''Allen'';
' OR st.text = 'SELECT I.CustomerID, C.FirstName, C.LastName, A.AddressLine1, A.City 
FROM Person.Contact AS C
    JOIN Sales.Individual AS I ON C.ContactID = I.ContactID
    JOIN Sales.CustomerAddress AS CA ON CA.CustomerID = I.CustomerID
    JOIN Person.Address AS A ON A.AddressID = CA.AddressID
WHERE C.FirstName = ''Logan'' OR C.LastName = ''Jones'';
';
GO

The query plan hash is a binary hash value calculated on the query execution plan and used to identify similar query execution plans. The query optimizer computes the query plan hash during query compilation, using execution plan values such as the logical and physical operators, and a subset of important operator attributes. Query execution plans that have the same physical and logical operator tree structure, as well as identical attribute values for the subset of important operator attributes, will have the same query plan hash.

When queries with identical query hashes are executed against different data, the differences in the cardinality of query results can cause the query optimizer to choose different query execution plans, resulting in different query plan hashes.

The following example shows how two similar queries can have the same query hash, but might not have the same query execution plan. There are two ways to view the hash values: the final SELECT statement and the Showplan XML, where they are listed in the StmtSimple element as attribute values for QueryHash and QueryPlanHash.

USE AdventureWorks;
GO
SET STATISTICS XML ON;
GO
SELECT T.TransactionID, T.TransactionDate, P.ProductID, P.Name FROM Production.TransactionHistory T
    JOIN Production.Product P
    ON T.ProductID = P.ProductID
WHERE P.ProductID = 1;
GO
SELECT T.TransactionID, T.TransactionDate, P.ProductID, P.Name FROM Production.TransactionHistory T
    JOIN Production.Product P
    ON T.ProductID = P.ProductID
WHERE P.ProductID = 3;
GO
SET STATISTICS XML OFF;
GO
--Show the query_hash and query plan hash
SELECT ST.text AS "Query Text", QS.query_hash AS "Query Hash", 
    QS.query_plan_hash AS "Query Plan Hash"
FROM sys.dm_exec_query_stats QS
    CROSS APPLY sys.dm_exec_sql_text (QS.sql_handle) ST
WHERE ST.text = 'SELECT T.TransactionID, T.TransactionDate, P.ProductID, P.Name FROM Production.TransactionHistory T
    JOIN Production.Product P
    ON T.ProductID = P.ProductID
WHERE P.ProductID = 1;
' OR ST.text = 'SELECT T.TransactionID, T.TransactionDate, P.ProductID, P.Name FROM Production.TransactionHistory T
    JOIN Production.Product P
    ON T.ProductID = P.ProductID
WHERE P.ProductID = 3;
';
GO

If the cardinality estimates for ProductID = 3 are high, the query optimizer might use the index scan operator in the query plan. If the cardinality estimates for ProductID = 1 are low, the query optimizer might use the index seek operator.

Non-Uniqueness for Hash Values

It is possible to have hash collisions in which dissimilar queries and query plans have the same hash value. Although the probability of hash collisions is very small and unlikely to occur, applications that depend on the uniqueness of the query hash and query plan hash can have errors because of duplicate hash values. For example, the query hash and query plan hash should not be used as a primary key or in a unique column.

Finding the Cumulative Cost of Queries

The following example returns information about the top five queries according to average CPU time. This example aggregates the queries according to their query hash so that logically equivalent queries are grouped by their cumulative resource consumption.

USE AdventureWorks;
GO
SELECT TOP 5 query_stats.query_hash AS "Query Hash", 
    SUM(query_stats.total_worker_time) / SUM(query_stats.execution_count) AS "Avg CPU Time",
    MIN(query_stats.statement_text) AS "Statement Text"
FROM 
    (SELECT QS.*, 
    SUBSTRING(ST.text, (QS.statement_start_offset/2) + 1,
    ((CASE statement_end_offset 
        WHEN -1 THEN DATALENGTH(st.text)
        ELSE QS.statement_end_offset END 
            - QS.statement_start_offset)/2) + 1) AS statement_text
     FROM sys.dm_exec_query_stats AS QS
     CROSS APPLY sys.dm_exec_sql_text(QS.sql_handle) as ST) as query_stats
GROUP BY query_stats.query_hash
ORDER BY 2 DESC;
GO

The following example returns information about the top five query plans according to average CPU time. This example aggregates the queries according to their query plan hash so that queries with the same query plan hash are grouped by their cumulative resource consumption.

USE AdventureWorks;
GO
SELECT TOP 5 query_plan_hash AS "Query Plan Hash",
    SUM(total_worker_time)/SUM(execution_count) AS "Avg CPU Time",
    MIN(CAST(query_plan as varchar(max))) AS "ShowPlan XML"
FROM sys.dm_exec_query_stats AS QS CROSS APPLY 
    sys.dm_exec_query_plan(QS.plan_handle)
GROUP BY query_plan_hash
ORDER BY 2 DESC;
GO

Using Query Hash and Query Plan Hash to Improve Query Performance

Track and Investigate Execution Plan Changes

Performance degradations or improvements can occur when a query is recompiled and the query optimizer produces a different query execution plan. Using the query plan hash, you can capture, store, and compare query execution plans over time. Knowing which execution plans have changed can help you diagnose the performance impact of data and configuration changes.

For example, after changing the system configuration, you can compare query plan hash values for mission-critical queries to their original query plan hash values. Differences in query plan hash values can tell you if the system configuration change resulted in updated query execution plans for important queries. You might also decide to stop execution for a current long-running query if its query plan hash in sys.dm_exec_requests differs from its baseline query plan hash, which is known to have good performance.

Parameterize Similar Queries to Improve Cached Plan Reuse

If a set of queries has the same query hash and query plan hash, you might improve performance by creating one parameterized query. Calling one query with parameters instead of multiple queries with literal values allows reuse of the cached query execution plan. For more information on the benefits of reusing cached query plans, see Execution Plan Caching and Reuse.

If you can not modify the application, you can use template plan guides with forced parameterization to achieve a similar result. For more information, see Specifying Query Parameterization Behavior by Using Plan Guides.