列存储索引:概述Columnstore indexes: Overview

适用对象:yesSQL ServeryesAzure SQL 数据库yesAzure SQL 数据仓库yes并行数据仓库APPLIES TO: yesSQL Server yesAzure SQL Database yesAzure SQL Data Warehouse yesParallel Data Warehouse

列存储索引是存储和查询大型数据仓库事实数据表的标准。Columnstore indexes are the standard for storing and querying large data warehousing fact tables. 此索引使用基于列的数据存储和查询处理,与面向行的传统存储相比,最多可实现 10 倍的数据仓库查询性能提升。This index uses column-based data storage and query processing to achieve gains up to 10 times the query performance in your data warehouse over traditional row-oriented storage. 与未压缩数据大小相比,还可最多实现 10 倍数据压缩。You can also achieve gains up to 10 times the data compression over the uncompressed data size. SQL Server 2016 (13.x)SQL Server 2016 (13.x) 起,列存储索引支持操作分析,即能够对事务工作负载运行高性能实时分析。Beginning with SQL Server 2016 (13.x)SQL Server 2016 (13.x), columnstore indexes enable operational analytics: the ability to run performant real-time analytics on a transactional workload.

了解相关方案:Learn about a related scenario:

什么是列存储索引?What is a columnstore index?

列存储索引是一种使用列式数据格式(称为“列存储”)存储、检索和管理数据的技术。A columnstore index is a technology for storing, retrieving, and managing data by using a columnar data format, called a columnstore.

重要术语和概念Key terms and concepts

以下关键概念和术语与列存储索引相关联。The following key terms and concepts are associated with columnstore indexes.

columnstoreColumnstore

列存储是在逻辑上整理为包含行和列的表,实际上以列式数据格式存储的数据。A columnstore is data that's logically organized as a table with rows and columns, and physically stored in a column-wise data format.

行存储Rowstore

行存储是在逻辑上整理为包含行和列的表,实际上以行式数据格式存储的数据。A rowstore is data that's logically organized as a table with rows and columns, and physically stored in a row-wise data format. 此格式是存储关系表数据的传统方法。This format is the traditional way to store relational table data. SQL ServerSQL Server 中,行存储是指基础数据存储格式为堆、聚集索引或内存优化表的表。In SQL ServerSQL Server, rowstore refers to a table where the underlying data storage format is a heap, a clustered index, or a memory-optimized table.

备注

提到列存储索引时,行存储和列存储这两个词用于强调数据存储格式。In discussions about columnstore indexes, the terms rowstore and columnstore are used to emphasize the format for the data storage.

行组Rowgroup

行组是同时压缩为列存储格式的一组行。A rowgroup is a group of rows that are compressed into columnstore format at the same time. 每个行组通常可包含的最大行数是 1,048,576 行。A rowgroup usually contains the maximum number of rows per rowgroup, which is 1,048,576 rows.

为获得高性能和高压缩率,列存储索引先将表划分为行组,再按列式格式压缩每个行组。For high performance and high compression rates, the columnstore index slices the table into rowgroups, and then compresses each rowgroup in a column-wise manner. 行组中的行数必须足够大,以便提高压缩率,并且还要足够小,以便从内存中操作中受益。The number of rows in the rowgroup must be large enough to improve compression rates, and small enough to benefit from in-memory operations.

列段Column segment

列段是行组内的数据列。A column segment is a column of data from within the rowgroup.

  • 每个行组包含表中每个列的一个列段。Each rowgroup contains one column segment for every column in the table.
  • 每个列段一起压缩并且存储于物理介质上。Each column segment is compressed together and stored on physical media.

Column segmentColumn segment

聚集列存储索引Clustered columnstore index

聚集列存储索引是整个表的物理存储。A clustered columnstore index is the physical storage for the entire table.

聚集列存储索引Clustered columnstore index

为了减少列段碎片和提升性能,列存储索引可能会将一些数据暂时存储到称为“增量存储”的聚集索引中,同时还存储已删除行的 ID 的 btree 列表。To reduce fragmentation of the column segments and improve performance, the columnstore index might store some data temporarily into a clustered index called a deltastore and a btree list of IDs for deleted rows. 增量存储操作在后台处理。The deltastore operations are handled behind the scenes. 若要返回正确的查询结果,聚集列存储索引将合并来自列存储和增量存储的查询结果。To return the correct query results, the clustered columnstore index combines query results from both the columnstore and the deltastore.

增量行组Delta rowgroup

增量行组是仅用于列存储索引的聚集索引。A delta rowgroup is a clustered index that's used only with columnstore indexes. 它提升了列存储压缩率和性能,具体是通过存储行,并在行数达到阈值后将行移入列存储。It improves columnstore compression and performance by storing rows until the number of rows reaches a threshold and are then moved into the columnstore.

在增量行组达到最大行数后,它会关闭。When a delta rowgroup reaches the maximum number of rows, it becomes closed. 元组移动进程会检查是否有已关闭行组。A tuple-mover process checks for closed row groups. 如果进程找到已关闭的行组,就会压缩行组,并将它存储到列存储中。If the process finds a closed rowgroup, it compresses the rowgroup and stores it into the columnstore.

增量存储Deltastore

列存储索引可以有多个增量行组。A columnstore index can have more than one delta rowgroup. 所有增量行组统称为“增量存储”。All of the delta rowgroups are collectively called the deltastore.

在大容量加载期间,大多数行直接转到列存储,而不通过增量存储中转。During a large bulk load, most of the rows go directly to the columnstore without passing through the deltastore. 在大容量加载结束时,某些行的数量太少了,可能无法满足行组的大小下限要求(即 102,400 行)。Some rows at the end of the bulk load might be too few in number to meet the minimum size of a rowgroup, which is 102,400 rows. 因此,最后行转到增量存储(而不是列存储)中。As a result, the final rows go to the deltastore instead of the columnstore. 对于少于 102,400 行的较小的大容量加载,所有行都直接转到增量存储中。For small bulk loads with less than 102,400 rows, all of the rows go directly to the deltastore.

非聚集列存储索引Nonclustered columnstore index

非聚集列存储索引和聚集列存储索引的功能相同。A nonclustered columnstore index and a clustered columnstore index function the same. 不同之处在于,非聚集索引是对行存储表创建的辅助索引,而聚集列存储索引是整个表的主存储。The difference is that a nonclustered index is a secondary index that's created on a rowstore table, but a clustered columnstore index is the primary storage for the entire table.

非聚集索引包含基础表中部分或全部行与列的副本。The nonclustered index contains a copy of part or all of the rows and columns in the underlying table. 索引定义为表的一个或多个列,并包含可用于筛选行的可选条件。The index is defined as one or more columns of the table and has an optional condition that filters the rows.

非聚集列存储索引支持实时运营分析,其中 OLTP 工作负载使用基础聚集索引,同时对列存储索引并发运行分析。A nonclustered columnstore index enables real-time operational analytics where the OLTP workload uses the underlying clustered index while analytics run concurrently on the columnstore index. 有关详细信息,请参阅开始使用列存储进行实时运营分析For more information, see Get started with columnstore for real-time operational analytics.

批模式执行Batch mode execution

批处理模式执行是一种查询处理方法,用于同时处理多个行。Batch mode execution is a query processing method that's used to process multiple rows together. 批模式执行与列存储存储格式紧密集成,并且围绕列存储存储格式进行了优化。Batch mode execution is closely integrated with, and optimized around, the columnstore storage format. 批处理模式执行有时亦称为基于矢量或矢量化执行。Batch mode execution is sometimes known as vector-based or vectorized execution. 对列存储索引执行的查询使用批处理模式执行,这通常可将查询性能提升 2 到 4 倍。Queries on columnstore indexes use batch mode execution, which improves query performance typically by two to four times. 有关详细信息,请参阅查询处理体系结构指南For more information, see the Query processing architecture guide.

为何要使用列存储索引?Why should I use a columnstore index?

列存储索引可实现极高的数据压缩级别(通常是传统方法的 10 倍),从而明显降低数据仓库存储成本。A columnstore index can provide a very high level of data compression, typically by 10 times, to significantly reduce your data warehouse storage cost. 对于分析,列存储索引实现的性能比 btree 索引高出一个量级。For analytics, a columnstore index offers an order of magnitude better performance than a btree index. 列存储索引是数据仓库和分析工作负载的首选数据存储格式。Columnstore indexes are the preferred data storage format for data warehousing and analytics workloads. SQL Server 2016 (13.x)SQL Server 2016 (13.x)开始,可以使用列存储索引对操作工作负荷执行实时分析。Starting with SQL Server 2016 (13.x)SQL Server 2016 (13.x), you can use columnstore indexes for real-time analytics on your operational workload.

列存储索引速度较快的原因:Reasons why columnstore indexes are so fast:

  • 列存储来自同一个域且通常相似的值,从而提高了压缩率。Columns store values from the same domain and commonly have similar values, which result in high compression rates. 最大限度地减少或消除系统中的 I/O 瓶颈,并显著降低内存占用量。I/O bottlenecks in your system are minimized or eliminated, and memory footprint is reduced significantly.

  • 较高的压缩率通过使用更小的内存中空间提高查询性能。High compression rates improve query performance by using a smaller in-memory footprint. 反过来,由于 SQL ServerSQL Server 可以在内存中执行更多查询和数据操作,因此可以提升查询性能。In turn, query performance can improve because SQL ServerSQL Server can perform more query and data operations in memory.

  • 批处理执行可同时处理多个行,通常可将查询性能提高 2 到 4 倍。Batch execution improves query performance, typically by two to four times, by processing multiple rows together.

  • 查询通常仅从表中选择几列,这减少了从物理介质的总 I/O。Queries often select only a few columns from a table, which reduces total I/O from the physical media.

何时应使用列存储索引?When should I use a columnstore index?

建议的用例:Recommended use cases:

如何在行存储索引与列存储索引之间做出选择?How do I choose between a rowstore index and a columnstore index?

行存储索引最适合用于查找数据、搜索特定值的查询,或者针对较小范围的值执行的查询。Rowstore indexes perform best on queries that seek into the data, when searching for a particular value, or for queries on a small range of values. 对事务工作负载使用行存储索引,因为它们往往大多需要进行表查找,而不是表扫描。Use rowstore indexes with transactional workloads because they tend to require mostly table seeks instead of table scans.

对于扫描大量数据(尤其是大型表中)的分析查询,列存储索引可提高性能。Columnstore indexes give high performance gains for analytic queries that scan large amounts of data, especially on large tables. 对数据仓库和分析工作负载(尤其是事实数据表)使用列存储索引,因为它们往往需要进行全表扫描,而不是表查找。Use columnstore indexes on data warehousing and analytics workloads, especially on fact tables, because they tend to require full table scans rather than table seeks.

是否可以在同一个表中组合行存储与列存储?Can I combine rowstore and columnstore on the same table?

是。Yes. SQL Server 2016 (13.x)SQL Server 2016 (13.x) 起,可以对行存储表创建可更新的非聚集列存储索引。Beginning with SQL Server 2016 (13.x)SQL Server 2016 (13.x), you can create an updatable nonclustered columnstore index on a rowstore table. 列存储索引存储选定列的副本,所以需要为此数据留出额外空间,但选定数据可实现平均 10 倍的压缩率。The columnstore index stores a copy of the selected columns, so you need extra space for this data, but the selected data is compressed on average 10 times. 可以同时对列存储索引和行存储索引上的事务运行分析。You can run analytics on the columnstore index and transactions on the rowstore index at the same time. 列存储随行存储表中的数据更改一起更新,因此这两个索引处理的是相同数据。The columnstore is updated when data changes in the rowstore table, so both indexes work against the same data.

SQL Server 2016 (13.x)SQL Server 2016 (13.x) 起,可以对列存储索引创建一个或多个非聚集行存储索引,并对基础列存储执行高效表查找。Beginning with SQL Server 2016 (13.x)SQL Server 2016 (13.x), you can have one or more nonclustered rowstore indexes on a columnstore index and perform efficient table seeks on the underlying columnstore. 其他选项也可供使用。Other options become available too. 例如,可以通过在行存储表中使用 UNIQUE 约束来强制主键约束。For example, you can enforce a primary key constraint by using a UNIQUE constraint on the rowstore table. 由于非唯一值无法插入到行存储表中,因此 SQL ServerSQL Server 无法将值插入列存储。Because a non-unique value fails to insert into the rowstore table, SQL ServerSQL Server can't insert the value into the columnstore.

元数据Metadata

列存储索引中的所有列在元数据中作为包含性列存储。All of the columns in a columnstore index are stored in the metadata as included columns. 列存储索引中没有任何键列。The columnstore index doesn't have key columns.

sys.indexes (Transact-SQL)sys.indexes (Transact-SQL) sys.index_columns (Transact-SQL)sys.index_columns (Transact-SQL)
sys.partitions (Transact-SQL)sys.partitions (Transact-SQL) sys.internal_partitions (Transact-SQL)sys.internal_partitions (Transact-SQL)
sys.column_store_segments (Transact-SQL)sys.column_store_segments (Transact-SQL) sys.column_store_dictionaries (Transact-SQL)sys.column_store_dictionaries (Transact-SQL)
sys.column_store_row_groups (Transact-SQL)sys.column_store_row_groups (Transact-SQL) sys.dm_db_column_store_row_group_operational_stats (Transact-SQL)sys.dm_db_column_store_row_group_operational_stats (Transact-SQL)
sys.dm_db_column_store_row_group_physical_stats (Transact-SQL)sys.dm_db_column_store_row_group_physical_stats (Transact-SQL) sys.dm_column_store_object_pool (Transact-SQL)sys.dm_column_store_object_pool (Transact-SQL)
sys.dm_db_column_store_row_group_operational_stats (Transact-SQL)sys.dm_db_column_store_row_group_operational_stats (Transact-SQL) sys.dm_db_index_operational_stats (Transact-SQL)sys.dm_db_index_operational_stats (Transact-SQL)
sys.dm_db_index_physical_stats (Transact-SQL)sys.dm_db_index_physical_stats (Transact-SQL)

所有关系表(除非指定为非聚集列存储索引)使用行存储作为基础数据格式。All relational tables, unless you specify them as a clustered columnstore index, use rowstore as the underlying data format. 如果不指定 WITH CLUSTERED COLUMNSTORE INDEX 选项,则 CREATE TABLE 将创建行存储表。CREATE TABLE creates a rowstore table unless you specify the WITH CLUSTERED COLUMNSTORE INDEX option.

使用 CREATE TABLE 语句创建表时,可通过指定 WITH CLUSTERED COLUMNSTORE INDEX 选项,将表创建为列存储。When you create a table with the CREATE TABLE statement, you can create the table as a columnstore by specifying the WITH CLUSTERED COLUMNSTORE INDEX option. 如果你已有一个行存储表并想要将其转换为列存储,可以使用 CREATE COLUMNSTORE INDEX 语句。If you already have a rowstore table and want to convert it to a columnstore, you can use the CREATE COLUMNSTORE INDEX statement.

任务Task 参考主题Reference topics 说明Notes
将表创建为列存储。Create a table as a columnstore. CREATE TABLE (Transact-SQL)CREATE TABLE (Transact-SQL) SQL Server 2016 (13.x)SQL Server 2016 (13.x)开始,你可以将表创建为聚集列存储索引。Beginning with SQL Server 2016 (13.x)SQL Server 2016 (13.x), you can create the table as a clustered columnstore index. 无需先创建行存储表,再将它转换为列存储。You don't have to first create a rowstore table and then convert it to columnstore.
创建具有列存储索引的内存表。Create a memory table with a columnstore index. CREATE TABLE (Transact-SQL)CREATE TABLE (Transact-SQL) SQL Server 2016 (13.x)SQL Server 2016 (13.x)开始,你可以创建具有列存储索引的内存优化表。Beginning with SQL Server 2016 (13.x)SQL Server 2016 (13.x), you can create a memory-optimized table with a columnstore index. 也可以在创建表后使用 ALTER TABLE ADD INDEX 语法添加列存储索引。The columnstore index can also be added after the table is created by using the ALTER TABLE ADD INDEX syntax.
将行存储表转换为列存储。Convert a rowstore table to a columnstore. CREATE COLUMNSTORE INDEX (Transact-SQL)CREATE COLUMNSTORE INDEX (Transact-SQL) 将现有堆集或二进制树转换为列存储。Convert an existing heap or binary tree to a columnstore. 示例演示了如何在执行此转换时处理现有的索引以及索引的名称。Examples show how to handle existing indexes and also the name of the index when performing this conversion.
将列存储表转换为行存储。Convert a columnstore table to a rowstore. 创建群集索引 (Transact-SQL)将列存储表转换回行存储堆CREATE CLUSTERED INDEX (Transact-SQL) or Convert a columnstore table back to a rowstore heap 通常不需要这样转换,但有时需要。Usually this conversion isn't necessary, but there can be times when you need to convert. 示例演示如何将列存储转换为堆或聚集索引。Examples show how to convert a columnstore to a heap or clustered index.
在行存储表中创建列存储索引。Create a columnstore index on a rowstore table. CREATE COLUMNSTORE INDEX (Transact-SQL)CREATE COLUMNSTORE INDEX (Transact-SQL) 一个行存储表可以有一个列存储索引。A rowstore table can have one columnstore index. SQL Server 2016 (13.x)SQL Server 2016 (13.x)开始,列存储索引可以包含筛选条件。Beginning with SQL Server 2016 (13.x)SQL Server 2016 (13.x), the columnstore index can have a filtered condition. 示例演示了基本语法。Examples show the basic syntax.
为操作分析创建高性能索引。Create performant indexes for operational analytics. 开始使用列存储进行实时运营分析Get started with columnstore for real-time operational analytics 描述了如何创建互补性列存储索引和 btree 索引,以便 OLTP 查询能够使用 btree 索引,分析查询能够使用列存储索引。Describes how to create complementary columnstore and btree indexes, so that OLTP queries use btree indexes and analytics queries use columnstore indexes.
为数据仓库创建高性能列存储索引。Create performant columnstore indexes for data warehousing. 用于数据仓库的列存储索引Columnstore indexes for data warehousing 介绍如何使用列存储表上的 btree 索引来创建高性能数据仓库查询。Describes how to use btree indexes on columnstore tables to create performant data warehousing queries.
使用 btree 索引对列存储索引强制主键约束。Use a btree index to enforce a primary key constraint on a columnstore index. 用于数据仓库的列存储索引Columnstore indexes for data warehousing 演示如何合并 btree 和列存储索引,以便对列存储索引强制主键约束。Shows how to combine btree and columnstore indexes to enforce primary key constraints on the columnstore index.
删除列存储索引。Drop a columnstore index. DROP INDEX (Transact-SQL)DROP INDEX (Transact-SQL) 使用 btree 索引所用的标准 DROP INDEX 语法删除列存储索引。Dropping a columnstore index uses the standard DROP INDEX syntax that btree indexes use. 删除聚集列存储索引会将列存储表转换为堆。Dropping a clustered columnstore index converts the columnstore table to a heap.
从列存储索引中删除行。Delete a row from a columnstore index. DELETE (Transact-SQL)DELETE (Transact-SQL) 使用 DELETE (Transact-SQL) 删除行。Use DELETE (Transact-SQL) to delete a row.

列存储行SQL ServerSQL Server 将行标记为已在逻辑上删除,但在重新生成索引前未回收行的物理存储。columnstore row: SQL ServerSQL Server marks the row as logically deleted, but doesn't reclaim the physical storage for the row until the index is rebuilt.

增量存储行SQL ServerSQL Server 在逻辑上和实际上都删除了行。deltastore row: SQL ServerSQL Server logically and physically deletes the row.
更新列存储索引中的行。Update a row in the columnstore index. UPDATE (Transact-SQL)UPDATE (Transact-SQL) 使用 UPDATE (Transact-SQL) 更新行。Use UPDATE (Transact-SQL) to update a row.

列存储行SQL ServerSQL Server 将行标记为已在逻辑上删除,然后将更新后的行插入增量存储中。columnstore row: SQL ServerSQL Server marks the row as logically deleted and then inserts the updated row into the deltastore.

增量存储行SQL ServerSQL Server 在增量存储中更新行。deltastore row: SQL ServerSQL Server updates the row in the deltastore.
将数据加载到列存储索引中。Load data into a columnstore index. 列存储索引数据加载Columnstore indexes data loading
强制增量存储中的所有行进入列存储。Force all rows in the deltastore to go into the columnstore. ALTER INDEX (Transact-SQL) ... REBUILDALTER INDEX (Transact-SQL) ... REBUILD

列存储索引碎片整理Columnstore indexes defragmentation
结合使用 ALTER INDEXREBUILD 选项,以强制所有行都转入列存储。ALTER INDEX with the REBUILD option forces all rows to go into the columnstore.
对列存储索引进行碎片整理。Defragment a columnstore index. ALTER INDEX (Transact-SQL)ALTER INDEX (Transact-SQL) ALTER INDEX ... REORGANIZE 联机对列存储索引进行碎片整理。ALTER INDEX ... REORGANIZE defragments columnstore indexes online.
合并具有列存储索引的表。Merge tables with columnstore indexes. MERGE (Transact-SQL)MERGE (Transact-SQL)

另请参阅See also

列存储索引数据加载 Columnstore indexes data loading
列存储索引各版本的功能摘要 Columnstore indexes versioned feature summary
列存储索引查询性能 Columnstore indexes query performance
开始使用列存储进行实时运营分析 Get started with columnstore for real-time operational analytics
用于数据仓库的列存储索引 Columnstore indexes for data warehousing
列存储索引碎片整理 Columnstore indexes defragmentation
SQL Server 索引设计指南 SQL Server index design guide
列存储索引体系结构Columnstore index architecture