列存储索引 - 查询性能Columnstore indexes - Query performance

适用对象: yesSQL ServeryesAzure SQL 数据库yesAzure SQL 数据仓库yes并行数据仓库APPLIES TO: yesSQL Server yesAzure SQL Database yesAzure SQL Data Warehouse yesParallel Data Warehouse

有助于实现列存储索引旨在提供的快速查询性能的建议。Recommendations for achieving the very fast query performance that columnstore indexes are designed to provide.

列存储索引对于分析和数据仓库工作负荷最多可提高 100 倍性能,并且比传统行存储索引最多可提高 10 倍的数据压缩率。Columnstore indexes can achieve up to 100x better performance on analytics and data warehousing workloads and up to 10x better data compression than traditional rowstore indexes. 这些建议可帮助查询实现列存储索引旨在提供的快速查询性能。These recommendations help your queries achieve the very fast query performance that columnstore indexes are designed to provide. 结尾处提供了有关列存储性能的进一步说明。Further explanations about columnstore performance are at the end.

提高查询性能的建议Recommendations for improving query performance

以下是有助于实现列存储索引旨在提供的高性能的一些建议。Here are some recommendations for achieving the high-performance columnstore indexes are designed to provide.

1.组织数据使更多行组不用进行全表扫描1. Organize data to eliminate more rowgroups from a full table scan

  • 利用插入顺序。Leverage insert order. 通常情况下,在传统数据仓库中,数据实际上是按时间顺序插入的,而分析是在时间维度中完成的。In common case in traditional data warehouse, the data is indeed inserted in time order and analytics is done in time dimension. 例如,按季度分析销售额。For example, analyzing sales by quarter. 对于此类型的工作负荷,行组消除自动发生。For this kind of workload, the rowgroup elimination happens automatically. SQL Server 2016 (13.x)SQL Server 2016 (13.x) 中,可以找出在查询处理过程中跳过的数字行组。In SQL Server 2016 (13.x)SQL Server 2016 (13.x), you can find out number rowgroups skipped as part of query processing.

  • 利用行存储聚集索引。Leverage the rowstore clustered index. 如果常见的查询谓词在某一列(如 C1)上,而该列与行的插入顺序无关,则可在列 C1 上创建一个行存储聚集索引,再通过删除行存储聚集索引来创建聚集列存储索引。If the common query predicate is on a column (e.g. C1) that is unrelated to insert order of the row, you can create a rowstore clustered index on columns C1 and then create clustered columnstore index by dropping the rowstore clustered index. 如果使用 MAXDOP = 1 显式创建聚集列存储索引,则得到的聚集列存储索引会在 C1 列上完美排序。if you create the clustered columnstore index explicitly using MAXDOP = 1, the resulting clustered columnstore index is perfectly ordered on column C1. 如果指定了 MAXDOP = 8,则会出现值在 8 个行组中重叠。If you specify MAXDOP = 8, then you will see overlap of values across 8 rowgroups. 此策略的一个常见示例是当你最初使用大型数据集创建列存储索引时。A common case of this strategy when you initially create columnstore index with large set of data. 请注意,对于非聚集列存储索引 (NCCI),如果基本行存储表具有聚集索引,则行已经排序。Note, for nonclustered columnstore index (NCCI), if the base rowstore table has a clustered index, the rows are already ordered. 在这种情况下,生成的非聚集列存储索引将自动进行排序。In this case, the resultant nonclustered columnstore index will automatically be ordered. 要注意的重要一点是,列存储索引本身不维护行的顺序。One important point to note is that columnstore index does not inherently maintain the order of rows. 当插入新行或更新旧行时,你可能需要重复该过程,因为分析查询性能可能会降低As new rows are inserted or older rows are updated, you may need to repeat the process as the analytics query performance may deteriorate

  • 利用表分区。Leverage table partitioning. 可以对列存储索引进行分区,然后使用分区排除来减少要扫描的行组数。You can partition the columnstore index and then use partition elimination to reduce number of rowgroups to scan. 例如,事实数据表存储客户的购买情况,常见的查询模式是查找特定客户的季度购买情况,可以将插入顺序与对客户列进行分区结合使用。For example, a fact table stores purchases made by customers and a common query pattern is to find quarterly purchases done by a specific customer, you can combine the insert order with partitioning on customer column. 每个分区都会包含特定客户的按时间顺序排序的行。Each partition will contain rows in time order for specific customer.

2.计划足够的内存以便并行创建列存储索引2. Plan for enough memory to create columnstore indexes in parallel

创建列存储索引默认情况下是一种并行操作,除非内存受到约束。Creating a columnstore index is by default a parallel operation unless memory is constrained. 并行创建索引要求比按顺序创建索引更多的内存。Creating the index in parallel requires more memory than creating the index serially. 在内存充足的情况下,创建列存储索引相当于在同一列上生成 B 树所用时间的 1.5 倍。When there is ample memory, creating a columnstore index takes on the order of 1.5 times as long as building a B-tree on the same columns.

创建列存储索引所需的内存取决于列数、字符串列的数目、并行度 (DOP) 和数据特性。The memory required for creating a columnstore index depends on the number of columns, the number of string columns, the degree of parallelism (DOP), and the characteristics of the data. 例如,如果表包含的行不到 100 万行,SQL ServerSQL Server 仅使用一个线程创建列存储索引。For example, if your table has fewer than one million rows, SQL ServerSQL Server will use only one thread to create the columnstore index.

如果表中包含的行超过 100 万行,但 SQL ServerSQL Server 无法获得足够大的内存授予来使用 MAXDOP 创建索引,SQL ServerSQL Server 会根据需要自动减少 MAXDOP,以便适合可用内存授予。If your table has more than one million rows, but SQL ServerSQL Server cannot get a large enough memory grant to create the index using MAXDOP, SQL ServerSQL Server will automatically decrease MAXDOP as needed to fit into the available memory grant. 在某些情况下,DOP 必须减小到一个以便在受到约束的内存下生成索引。In some cases, DOP must be decreased to one in order to build the index under constrained memory.

SQL Server 2016 (13.x)SQL Server 2016 (13.x) 开始,查询始终以批处理模式运行。Beginning with SQL Server 2016 (13.x)SQL Server 2016 (13.x), the query will always operate in batch mode. 在以前版本中,仅当 DOP 大于 1 时,才使用批处理执行。In previous releases, batch execution is only used when DOP is greater than one.

说明的列存储性能Columnstore Performance Explained

列存储索引通过将高速内存中批处理模式处理与可极大减少 I/O 要求的技术组合使用来实现高查询性能。Columnstore indexes achieve high query performance by combining high-speed in-memory batch mode processing with techniques that greatly reduce I/O requirements. 由于分析查询扫描大量行,它们通常进行 IO 绑定,因此在查询执行过程中减少 I/O 对于列存储索引的设计至关重要。Since analytics queries scan large numbers of rows, they are typically IO-bound, and therefore reducing I/O during query execution is critical to the design of columnstore indexes. 数据读取到内存中后,减少内存中操作的数目很重要。Once data has been read into memory, it is critical to reduce the number of in-memory operations.

列存储索引通过高数据压缩率、列存储消除、行组消除和批处理来减少 I/O 和优化内存中操作。Columnstore indexes reduce I/O and optimize in-memory operations through high data compression, columnstore elimination, rowgroup elimination, and batch processing.

数据压缩Data compression

列存储索引可实现比行存储索引最多高 10 倍的数据压缩率。Columnstore indexes achieve up to 10x greater data compression than rowstore indexes. 这极大地减少了执行分析查询所需的 I/O,并因此可以提高查询性能。This greatly reduces the I/O required to execute analytics queries and therefore improves query performance.

  • 列存储索引从磁盘读取压缩的数据,这意味着需要将更少字节的数据读取到内存。Columnstore indexes read compressed data from disk, which means fewer bytes of data need to be read into memory.

  • 列存储索引以压缩形式将数据存储在内存中,这通过减少将相同数据读取到内存的次数来减少 I/O。Columnstore indexes store data in compressed form in memory which reduces I/O by reducing the number of times the same data is read into memory. 例如,与以未压缩形式存储数据相比,列存储索引使用 10 倍压缩率可以在内存中保留 10 倍更多数据。For example, with 10x compression, columnstore indexes can keep 10x more data in memory compared to storing the data in uncompressed form. 内存中的数据更多,列存储索引则更有可能无需从磁盘进行更多读取,而是在内存中找到所需的数据。With more data in memory, it is more likely that the columnstore index will find the data it needs in memory without incurring additional reads from disk.

  • 列存储索引按列(而不是按行)压缩数据,这可实现高压缩率并减少磁盘上存储的数据的大小。Columnstore indexes compress data by columns instead of by rows which achieves high compression rates and reduces the size of the data stored on disk. 每个列独自压缩和存储。Each column is compressed and stored independently. 列中的数据将始终具有相同的数据类型,并往往具有相似的值。Data within a column always has the same data type and tends to have similar values. 当值相似时,数据压缩技术可以很好地实现更高的压缩率。Data compression techniques are very good at achieving higher compression rates when values are similar.

  • 例如,如果事实数据表存储客户地址,并且有一列用于国家/地区,则可能值的总数少于 200。For example, if a fact table stores customer addresses and has a column for country, the total number of possible values is fewer than 200. 其中的某些值将重复多次。Some of those values will be repeated many times. 如果事实数据表具有 1 亿行,则“国家/地区”列将轻松压缩,并需要很少的存储空间。If the fact table has 100 million rows, the country column will compress easily and require very little storage. 按行压缩就不能以这种方式利用列值的相似性,并在压缩“国家/地区”列中的值时将使用更多字节。Row-by-row compression is not able to capitalize on the similarity of column values in this way and will use more bytes to compress the values in the country column.

列消除Column elimination

列存储索引会跳过读取查询结果不需要的列。Columnstore indexes skip reading in columns that are not required for the query result. 这种功能(称为“列消除”)可进一步减少执行查询的 I/O,因此可以提高查询性能。This ability, called column elimination, further reduces I/O for query execution and therefore improves query performance.

  • 列消除之所以可能是因为数据是按列组织和压缩的。Column elimination is possible because the data is organized and compressed column by column. 与此相反,当数据按行存储时,每行中的列值以物理方式存储在一起,并且不能轻松分离。In contrast, when data is stored row-by-row, the column values in each row are physically stored together and cannot be easily separated. 查询处理器要检索特定列值需要读取整个行,这会增加 I/O,因为不必要地将额外的数据读取到内存。The query processor needs to read in an entire row to retrieve specific column values, which increases I/O because extra data is unnecessarily read into memory.

  • 例如,如果表有 50 列,而查询仅使用其中 5 列,列存储索引仅从磁盘中提取这 5 列。For example, if a table has 50 columns and the query only uses 5 of those columns, the columnstore index only fetches the 5 columns from disk. 它会跳过读取其他 45 列。It skips reading in the other 45 columns. 假定所有列都具有相似的大小,这可另外减少 I/O 90%。This reduces I/O by another 90% assuming all columns are of similar size. 如果相同的数据以行存储形式存储,查询处理器需要读取其他 45 列。If the same data are stored in a rowstore, the query processor needs to read the additional 45 columns.

行组消除Rowgroup elimination

在全表扫描中,大部分数据通常不匹配查询谓词条件。For full table scans, a large percentage of the data usually does not match the query predicate criteria. 列存储索引通过使用元数据能够跳过读取不包含查询结果所需数据的行组,所有这些都不需要实际 I/O。By using metadata, the columnstore index is able to skip reading in the rowgroups that do not contain data required for the query result, all without actual I/O. 这种功能(称为“行组消除”)可减少全表扫描的 I/O,因此可以提高查询性能。This ability, called rowgroup elimination, reduces I/O for full table scans and therefore improves query performance.

列存储索引何时需要执行全表扫描?When does a columnstore index need to perform a full table scan?

SQL Server 2016 (13.x)SQL Server 2016 (13.x) 开始,可以在聚集列存储索引上创建一个或多个常规非聚集 B 树索引,就像可以在行存储堆上创建一样。Starting with SQL Server 2016 (13.x)SQL Server 2016 (13.x), you can create one or more regular nonclustered B-tree indexes on a clustered columnstore index just like you can on a rowstore heap. 非聚集 B 树索引可以加快具有相等谓词或包含小范围值的谓词的查询速度。The nonclustered B-tree indexes can speed up a query that has an equality predicate or a predicate with a small range of values. 对于更复杂的谓词,查询优化器可以选择全表扫描。For more complicated predicates, the query optimizer might choose a full table scan. 如果没有跳过行组的功能,全表扫描会非常耗时,特别是对于大型表更是如此。Without the ability to skip rowgroups, a full table scan would be very time-consuming, especially for large tables.

分析查询何时从全表扫描的行组消除受益?When does an analytics query benefit from rowgroup elimination for a full-table scan?

例如,一个零售企业已使用具有聚集列存储索引的事实数据表来为其销售数据建模。For example, a retail business has modeled their sales data using a fact table with clustered columnstore index. 每条新的销售数据均存储交易的各个属性,包括产品销售日期。Each new sale stores various attributes of the transaction including the date a product was sold. 有趣的是,尽管列存储索引不保证排序顺序,但此表中的行会按排序日期顺序加载。Interestingly, even though columnstore indexes do not guarantee a sorted order, the rows in this table will be loaded in a date-sorted order. 随着时间的推移,此表将会增长。Over time this table will grow. 虽然零售企业可能会保留过去 10 年的销售数据,但分析查询可能只需要计算上一季度的聚合。Although the retail business might keep sales data for the last 10 years, an analytics query might only need to compute an aggregate for last quarter. 列存储索引只需查看日期列的元数据就可避免访问前 39 个季度的数据。Columnstore indexes can eliminate accessing the data for the previous 39 quarters by just looking at the metadata for the date column. 这额外地减少了 97% 读入内存并进行处理的数据量。This is an additional 97% reduction in the amount of data that is read into memory and processed.

在全表扫描中跳过哪些行组?Which rowgroups are skipped in a full table scan?

为了确定要消除哪些行组,列存储索引在每个行组中使用元数据来存储每个列段的最小值和最大值。To determine which rows groups to eliminate, the columnstore index uses metadata to store the minimum and maximum values of each column segment for each rowgroup. 当列段范围不满足查询谓词条件时,就会跳过整个行组而不执行任何实际 IO。When none of the column segment ranges meet the query predicate criteria, the entire rowgroup is skipped without doing any actual IO. 这样做之所以有效是因为数据通常是按排序顺序加载的,虽然行不一定进行排序,但相似的数据值通常位于同一行组或相邻行组中。This works because the data is usually loaded in a sorted order and although rows are not guaranteed to be sorted, similar data values are often located within the same rowgroup or a neighboring rowgroup.

有关行组的详细信息,请参阅“列存储索引指南”For more details about rowgroups, see Columnstore Indexes Guide

批处理模式执行Batch Mode Execution

批处理模式执行是指为提高执行效率将一组行(通常最多 900 行)一起处理。Batch mode execution refers to processing a set of rows, typically up to 900 rows, together for execution efficiency. 例如,查询 SELECT SUM (Sales) FROM SalesData 从表 SalesData 聚合了总销售额。For example, the query SELECT SUM (Sales) FROM SalesData aggregates the total sales from the table SalesData. 以批处理模式执行时,查询执行引擎以 900 个值为一组计算聚合。In batch mode execution, the query execution engine computes the aggregate in group of 900 values. 这样会将元数据(访问成本和其他类型的开销)分布到批处理的所有行中,而不是支付每行的成本,从而大大减少了代码路径。This spreads metadata the access costs and other types of overhead over all the rows in a batch, rather than paying the cost for each row thereby significantly reducing the code path. 批处理模式处理在可能的情况下会对压缩数据运行,并消除了行模式处理所用的一些交换运算符。Batch mode processing operates on compressed data when possible and eliminates some of the exchange operators used by row mode processing. 这可以将分析查询的执行速度提高好几个数量级。This speeds up execution of analytics queries by orders of magnitude.

并非所有查询执行运算符都可以在批处理模式下执行。Not all query execution operators can be executed in batch mode. 例如,DML 操作(如 Insert、Delete 或 Update)一次执行一行。For example, DML operations such as Insert, Delete or Update are executed row at a time. 批处理模式运算符面向可提高查询性能的运算符(如 Scan、Join、Aggregate、sort 等)。Batch mode operators target operators for speeding up query performance such as Scan, Join, Aggregate, sort and so on. 由于列存储索引是在 SQL Server 2012 (11.x)SQL Server 2012 (11.x) 中引入的,因此需要持续扩展可以在批处理模式下执行的运算符。Since the columnstore index was introduced in SQL Server 2012 (11.x)SQL Server 2012 (11.x), there is a sustained effort to expand the operators that can be executed in the batch mode. 下表按照产品版本显示了以批处理模式运行的运算符。The table below shows the operators that run in batch mode according to the product version.

批处理模式运算符Batch Mode Operators 何时使用此项?When is this used? SQL Server 2012 (11.x)SQL Server 2012 (11.x) SQL Server 2014 (12.x)SQL Server 2014 (12.x) SQL Server 2016 (13.x)SQL Server 2016 (13.x) SQL 数据库SQL Database1and SQL 数据库SQL Database1 注释Comments
DML 操作(insert、delete、update、merge)DML operations (insert, delete, update, merge) no no no DML 不是批处理模式操作,因为它不是并行的。DML is not a batch mode operation because it is not parallel. 即使我们启用串行模式批处理操作,允许 DML 以批处理模式处理,我们也看不到明显的收益。Even when we enable serial mode batch processing, we don't see significant gains by allowing DML to be processed in batch mode.
columnstore index scancolumnstore index scan 扫描SCAN 不适用NA yes yes 对于列存储索引,我们可以将谓词推送到 SCAN 节点。For columnstore indexes, we can push the predicate to the SCAN node.
columnstore Index Scan(非聚集)columnstore Index Scan (nonclustered) 扫描SCAN yes yes yes yes
index seekindex seek 不适用NA 不适用NA no 我们以行模式通过非聚集 B 树索引执行查找操作。We perform a seek operation through a nonclustered B-tree index in rowmode.
compute scalarcompute scalar 计算结果为标量值的表达式。Expression that evaluates to a scalar value. yes yes yes 数据类型存在一些限制。There are some restrictions on data type. 这适用于所有批处理模式运算符。This is true for all batch mode operators.
串联 (concatenation)concatenation UNION 和 UNION ALLUNION and UNION ALL no yes yes
filterfilter 应用谓词Applying predicates yes yes yes
hash matchhash match 基于哈希的聚合函数、外部哈希联接、右哈希联接、左哈希联接、右内部联接、左内部联接Hash-based aggregate functions, outer hash join, right hash join, left hash join, right inner join, left inner join yes yes yes 聚合的限制:不能对字符串执行 min/max。Restrictions for aggregation: no min/max for strings. 可用的聚合函数是 sum/count/avg/min/max。Aggregation functions available are sum/count/avg/min/max.
联接的限制:不能对非整数类型执行任何不匹配的类型联接。Restrictions for join: no mismatched type joins on non-integer types.
merge joinmerge join no no no
多线程查询multi-threaded queries yes yes yes
嵌套循环nested loops no no no
单线程查询,在 MAXDOP 1 下运行single-threaded queries running under MAXDOP 1 no no yes
带有串行查询计划的单线程查询single-threaded queries with a serial query plan no no yes
sortsort 使用列存储索引的 SCAN 中的 Order by 子句。Order by clause on SCAN with columnstore index. no no yes
top sorttop sort no no yes
window aggregateswindow aggregates 不适用NA 不适用NA yes SQL Server 2016 (13.x)SQL Server 2016 (13.x) 中的新运算符。New operator in SQL Server 2016 (13.x)SQL Server 2016 (13.x).

1适用于 SQL Server 2016 (13.x)SQL Server 2016 (13.x)SQL 数据库SQL Database 高级层、标准层(S3 及更高)和所有 vCore 层,以及 并行数据仓库Parallel Data Warehouse1Applies to SQL Server 2016 (13.x)SQL Server 2016 (13.x), SQL 数据库SQL Database Premium tiers, Standard tiers - S3 and above, and all vCore tiers, and 并行数据仓库Parallel Data Warehouse

聚合下推Aggregate Pushdown

聚合计算的常规执行路径是从 SCAN 节点提取符合条件的行,然后以批处理模式聚合值。A normal execution path for aggregate computation to fetch the qualifying rows from the SCAN node and aggregate the values in Batch Mode. 尽管这提供了良好的性能,但使用 SQL Server 2016 (13.x)SQL Server 2016 (13.x),可以将聚合运算推送到 SCAN 节点,以便在批处理模式执行的基础上将聚合计算的性能提高好几个数量级,前提是要满足以下条件:While this delivers good performance, but with SQL Server 2016 (13.x)SQL Server 2016 (13.x), the aggregate operation can be pushed to the SCAN node to improve the performance of aggregate computation by orders of magnitude on top of Batch Mode execution provided the following conditions are met:

  • 聚合为 MINMAXSUMCOUNTCOUNT(*)The aggregates are MIN, MAX, SUM, COUNT and COUNT(*).
  • 聚合运算符必须基于 SCAN 节点或包含 GROUP BY 的 SCAN 节点。Aggregate operator must be on top of SCAN node or SCAN node with GROUP BY.
  • 此聚合不是非重复聚合。This aggregate is not a distinct aggregate.
  • 聚合列不是字符串列。The aggregate column is not a string column.
  • 聚合列不是虚拟列。The aggregate column is not a virtual column.
  • 输入和输出数据类型必须是以下类型之一,并且必须适合 64 位。The input and output datatype must be one of the following and must fit within 64 bits.
    • tinyintintbigintsmallintbittinyint, int, bigint, smallint, bit
    • 精度 <= 18 的 smallmoneymoneydecimalnumericsmallmoney, money, decimal and numeric with precision <= 18
    • smalldatedatedatetimedatetime2timesmalldate, date, datetime, datetime2, time

通过在对缓存友好的执行中对压缩/编码数据进行高效聚合并利用 SIMD 可进一步加快聚合下推速度Aggregate push down is further accelerated by efficient Aggregation on compressed/encoded data in cache-friendly execution and by leveraging SIMD

aggregate pushdownaggregate pushdown

例如,在以下两个查询中完成了聚合下推:For example, aggregate pushdown is done in both of the queries below:

SELECT  productkey, SUM(TotalProductCost)    
FROM FactResellerSalesXL_CCI    
GROUP BY productkey    
    
SELECT  SUM(TotalProductCost)    
FROM FactResellerSalesXL_CCI    

字符串谓词下推String predicate pushdown

在设计数据仓库架构时,建议的架构建模是使用星型架构或雪花型架构,其中包括一个或多个事实数据表和多个维度表。When designing a data warehouse schema, the recommended schema modeling is to use star-schema or snowflake schema consisting of one or more fact tables and many dimension tables. 事实数据表 存储业务度量值或事务,而 维度表 存储分析事实数据需要跨越的维度。The fact table stores the business measurements or transactions and dimension table store the dimensions across which facts need to be analyzed.

例如,事实可以是一条表示某一特定区域中某一特定产品的销售额的记录,而维度则表示一组区域、产品等。For example, a fact can be a record representing a sale of a particular product in a specific region while the dimension represents a set of regions, products and so on. 事实数据表和维度表通过主键/外键关系进行连接。The fact and dimension tables are connected through a primary/foreign key relationship. 最常用的分析查询将一个或多个维度表与事实数据表进行联接。Most commonly used analytics queries join one or more dimension tables with the fact table.

让我们设想一个维度表 ProductsLet us consider a dimension table Products. 典型的主键是通常用字符串数据类型表示的 ProductCodeA typical primary key will be ProductCode which is commonly represented as string data type. 为提高查询性能,最佳做法是创建代理键(通常为整数列),从事实数据表引用维度表中的行。For performance of queries, it is a best practice to create surrogate key, typically an integer column, to refer to the row in the dimension table from the fact table.

列存储索引非常高效地运行具有联接/涉及数值的谓词或基于整数的键的分析查询。The columnstore index runs analytics queries with joins/predicates involving numeric or integer based keys very efficiently. 但是,在很多客户工作负荷中,我们发现使用基于字符串的列链接事实/维度表,结果是使用列存储索引的查询性能并不如预期。However, in many customer workloads, we find the use to string based columns linking fact/dimension tables and with the result the query performance with columnstore index was not as performing. SQL Server 2016 (13.x)SQL Server 2016 (13.x) 通过将包含字符串列的谓词下推到 SCAN 节点,使用基于字符串的列显著提高了分析的性能。improves the performance of analytics queries with string based columns significantly by pushing down the predicates with string columns to the SCAN node.

字符串谓词下推利用为列创建的主/辅助字典来提高查询性能。String predicate pushdown leverages the primary/secondary dictionary created for column(s) to improve the query performance. 例如,让我们考虑在行组中创建一个包含 100 个不同字符串值的字符串列段。For example, let us consider string column segment within a rowgroup consisting of 100 distinct string values. 假定有 100 万行,这意味着平均每个不同的字符串值被引用了 10,000 次。This means each distinct string value is referenced 10,000 times on average assuming 1 million rows.

使用字符串谓词下推,执行查询时针对字典中的值计算谓词,如果它符合条件,引用字典值的所有行都将自动符合条件。With string predicate pushdown, the query execution computes the predicate against the values in the dictionary and if it qualifies, all rows referring to the dictionary value are automatically qualified. 这在两个方面提高了性能:This improves the performance in two ways:

  1. 仅返回符合条件的行,从而减少了需要传递出 SCAN 节点的行数。Only the qualified row is returned reducing number of the rows that need to flow out of SCAN node.
  2. 显著减少了字符串比较次数。The number of string comparisons are significantly reduced. 在此示例中,只需要 100 次字符串比较,而不用比较 100 万次。In this example, only 100 string comparisons are required as against 1 million comparisons. 如下所述,有一些限制:There are some limitations as described below:

期待您的反馈 :如果在本文中发现过时或不正确的内容(如步骤或代码示例),请告诉我们。We are listening: If you find something outdated or incorrect in this article, such as a step or a code example, please tell us. 可以单击此页底部的“反馈” 部分中的“本页” 按钮。You can click the This page button in the Feedback section at the bottom of this page. 我们通常在第二天阅读有关 SQL 的每项反馈。We read every item of feedback about SQL, typically the next day. 谢谢。Thanks.

-   <span data-ttu-id="b8101-325">不能对增量行组执行字符串谓词下推。</span><span class="sxs-lookup"><span data-stu-id="b8101-325">No string predicate pushdown for delta rowgroups.</span></span> <span data-ttu-id="b8101-326">增量行组中的列没有字典。</span><span class="sxs-lookup"><span data-stu-id="b8101-326">There is no dictionary for columns in delta rowgroups.</span></span>    
-   <span data-ttu-id="b8101-327">如果字典大小超过 64 KB,则不能执行字符串谓词下推。</span><span class="sxs-lookup"><span data-stu-id="b8101-327">No string predicate pushdown if dictionary exceeds 64 KB entries.</span></span>    
-   <span data-ttu-id="b8101-328">不支持计算结果为 NULL 的表达式。</span><span class="sxs-lookup"><span data-stu-id="b8101-328">Expression evaluating NULLs are not supported.</span></span>    

另请参阅See Also

列存储索引设计指南 Columnstore Indexes Design Guidance
列存储索引数据加载指南 Columnstore Indexes Data Loading Guidance
开始使用列存储进行实时运行分析 Get started with Columnstore for real time operational analytics
针对数据仓库的列存储索引 Columnstore Indexes for Data Warehousing
列存储索引碎片整理 Columnstore Indexes Defragmentation
列存储索引体系结构 Columnstore Index Architecture
CREATE INDEX (Transact-SQL) CREATE INDEX (Transact-SQL)
ALTER INDEX (Transact-SQL)ALTER INDEX (Transact-SQL)