Troubleshooting Poor Query Performance: Cardinality Estimation

The query optimizer in SQL Server is cost-based. This means that it selects query plans that have the lowest estimated processing cost to execute. The query optimizer determines the cost of executing a query plan based on two main factors:

  • The total number of rows processed at each level of a query plan, referred to as the cardinality of the plan.

  • The cost model of the algorithm dictated by the operators used in the query.

The first factor, cardinality, is used as an input parameter of the second factor, the cost model. Therefore, improved cardinality leads to better estimated costs and, in turn, faster execution plans.

SQL Server estimates cardinalities primarily from histograms that are created when indexes or statistics are created, either manually or automatically. Sometimes, SQL Server also uses constraint information and logical rewrites of queries to determine cardinality.

In the following cases, SQL Server cannot accurately calculate cardinalities. This causes inaccurate cost calculations that may cause suboptimal query plans. Avoiding these constructs in queries may improve query performance. Sometimes, alternative query formulations or other measures are possible and these are pointed out.

  • Queries with predicates that use comparison operators between different columns of the same table.

  • Queries with predicates that use operators, and any one of the following are true:

    • There are no statistics on the columns involved on either side of the operators.

    • The distribution of values in the statistics is not uniform, but the query seeks a highly selective value set. This situation can be especially true if the operator is anything other than the equality (=) operator.

    • The predicate uses the not equal to (!=) comparison operator or the NOT logical operator.

  • Queries that use any of the SQL Server built-in functions or a scalar-valued, user-defined function whose argument is not a constant value.

  • Queries that involve joining columns through arithmetic or string concatenation operators.

  • Queries that compare variables whose values are not known when the query is compiled and optimized.

The following measures can be used to try to improve performance on these types of queries:

  • Build useful indexes or statistics on the columns that are involved in the query. For more information, see Designing Indexes and Using Statistics to Improve Query Performance.

  • Consider using computed columns and rewriting the query if the query uses comparison or arithmetic operators to compare or combine two or more columns. For example, the following query compares the values in two columns:

    SELECT * FROM MyTable
    WHERE MyTable.Col1 > MyTable.Col2
    

    Performance may be improved if you add a computed column Col3 to MyTable that calculates the difference between Col1 and Col2 (Col1 minus Col2). Then, rewrite the query:

    SELECT * FROM MyTable
    WHERE Col3 > 0
    

    Performance will probably improve more if you build an index on MyTable.Col3.