列存储索引 - 碎片整理Columnstore indexes - defragmentation

适用对象: yesSQL ServeryesAzure SQL 数据库yesAzure SQL 数据仓库yes并行数据仓库APPLIES TO: yesSQL Server yesAzure SQL Database yesAzure SQL Data Warehouse yesParallel Data Warehouse

对列存储索引进行碎片整理的任务。Tasks for defragmenting columnstore indexes.

使用 ALTER INDEX REORGANIZE 对列存储索引进行在线碎片整理Use ALTER INDEX REORGANIZE to defragment a columnstore index online

适用对象: SQL ServerSQL ServerSQL Server 2016 (13.x)SQL Server 2016 (13.x) 及更高版本)、SQL 数据库SQL DatabaseApplies to: SQL ServerSQL Server (starting with SQL Server 2016 (13.x)SQL Server 2016 (13.x)), SQL 数据库SQL Database

加载任何类型的数据之后,增量存储中会有多个较小的行组。After performing loads of any type, you can have multiple small rowgroups in the deltastore. 可以使用 ALTER INDEX REORGANIZE 将所有行组强制载入列存储,然后将行组合并成具有更多行的较少行组。You can use ALTER INDEX REORGANIZE to force all of the rowgroups into the columnstore, and then to combine the rowgroups into fewer rowgroups with more rows. 重新组织操作还将删除已从列存储中删除的行。The reorganize operation will also remove rows that have been deleted from the columnstore.

有关详细信息,请参阅 SQL 数据库引擎团队博客中的以下博客文章。For more information, refer to the following blog posts on the SQL Database Engine Team Blog.

进行重新组织的建议Recommendations for reorganizing

在执行一次或多次数据加载后,为了尽快优化查询性能,需要重新组织列存储索引。Reorganize a columnstore index after one or more data loads to achieve query performance benefits as quickly as possible. 重新组织最初需要额外的 CPU 资源来压缩数据,这可能降低整体系统性能。Reorganizing will initially require additional CPU resources to compress the data, which could slow overall system performance. 但是,压缩数据后,可以提高查询性能。However, as soon as the data is compressed, query performance can improve.

使用 sys.dm_db_column_store_row_group_physical_stats (Transact-SQL) 中的示例来计算碎片。Use the example in sys.dm_db_column_store_row_group_physical_stats (Transact-SQL) to compute the fragmentation. 这可帮助你确定它是否值得执行 REORGANIZE 操作。This helps you to determine whether it is worthwhile to perform a REORGANIZE operation.

例如:重新组织的工作原理Example: How reorganizing works

此示例演示 ALTER INDEX REORGANIZE 如何将所有增量存储行组强制转到列存储中,然后合并行组。This example shows how ALTER INDEX REORGANIZE can force all deltastore rowgroups into the columnstore and then combine the rowgroups.

  1. 运行此 Transact SQL 以创建包含 300,000 行的临时表。Run this Transact-SQL to create a staging table that contains 300,000 rows. 我们将使用它来将行批量加载到列存储索引中。We will use this to bulk load rows into a columnstore index.

    USE master;  
    GO  
    
    IF EXISTS (SELECT name FROM sys.databases  
        WHERE name = N'[columnstore]')  
        DROP DATABASE [columnstore];  
    GO  
    
    CREATE DATABASE [columnstore];  
    GO  
    
    USE columnstore;
    GO
    
    IF EXISTS (SELECT name FROM sys.tables  
        WHERE name = N'staging'  
        AND object_id = OBJECT_ID (N'staging'))  
    DROP TABLE dbo.staging;  
    GO  
    
    CREATE TABLE [staging] (  
         AccountKey int NOT NULL,  
         AccountDescription nvarchar (50),  
         AccountType nvarchar(50),  
         AccountCodeAlternateKey int  
    );  
    GO  
    
    -- Load data  
    DECLARE @loop int;  
    DECLARE @AccountDescription varchar(50);  
    DECLARE @AccountKey int;  
    DECLARE @AccountType varchar(50);  
    DECLARE @AccountCode int;  
    
    SELECT @loop = 0;  
    BEGIN TRAN  
        WHILE (@loop < 300000)   
          BEGIN  
            SELECT @AccountKey = CAST (RAND()*10000000 AS int);  
            SELECT @AccountDescription = 'accountdesc ' + CONVERT(varchar(20), @AccountKey);  
            SELECT @AccountType = 'AccountType ' + CONVERT(varchar(20), @AccountKey);  
            SELECT @AccountCode =  CAST (RAND()*10000000 AS int);  
    
            INSERT INTO staging VALUES (  
               @AccountKey,   
               @AccountDescription,   
               @AccountType,   
               @AccountCode  
            );  
    
            SELECT @loop = @loop + 1;  
          END  
    COMMIT  
    
  2. 创建存储为列存储索引的表。Create a table stored as a columnstore index.

    IF EXISTS (SELECT name FROM sys.tables  
        WHERE name = N'cci_target'  
        AND object_id = OBJECT_ID (N'cci_target'))  
    DROP TABLE dbo.cci_target;  
    GO  
    
    -- Create a table with a clustered columnstore index  
    -- and the same columns as the rowstore staging table.  
    CREATE TABLE cci_target (  
         AccountKey int NOT NULL,  
         AccountDescription nvarchar (50),  
         AccountType nvarchar(50),  
         AccountCodeAlternateKey int,  
         INDEX idx_cci_target CLUSTERED COLUMNSTORE  
    )  
    GO  
    
  3. 将临时表行批量插入到列存储表中。Bulk insert the staging table rows into the columnstore table. INSERT INTO ... SELECT 用于执行批量插入。INSERT INTO ... SELECT performs a bulk insert. TABLOCK 允许以并行方式执行 INSERTThe TABLOCK allows the INSERT to execute with parallelism.

    -- Insert rows in parallel  
    INSERT INTO cci_target WITH (TABLOCK)  
    SELECT TOP (300000) * FROM staging;  
    GO  
    
  4. 通过使用 sys.dm_db_column_store_row_group_physical_stats 动态管理视图 (DMV) 来查看行组 。View the rowgroups by using the sys.dm_db_column_store_row_group_physical_stats dynamic management view (DMV).

    -- Run this dynamic management view (DMV) to see the OPEN rowgroups.   
    -- The number of rowgroups depends on the degree of parallelism.   
    -- You will see multiple OPEN rowgroups depending on the degree of parallelism.   
    -- This is because insert operation can run in parallel in [!INCLUDE[ssSQL15](../../includes/sssql15-md.md)].  
    
    SELECT *   
    FROM sys.dm_db_column_store_row_group_physical_stats   
    WHERE object_id  = object_id('cci_target')  
    ORDER BY row_group_id;  
    

    在此示例中,结果显示 8 个 OPEN 行组,每个包含 37,500 行。In this example, the results show 8 OPEN rowgroups that each have 37,500 rows. OPEN 行组的数目取决于 max_degree_of_parallelism 设置 。The number of OPEN rowgroups depends on the max_degree_of_parallelism setting.

    OPEN 行组OPEN rowgroups

  5. 使用带有 COMPRESS_ALL_ROW_GROUPS 选项的 ALTER INDEX REORGANIZE 强制所有行组压缩到列存储。Use ALTER INDEX REORGANIZE with the COMPRESS_ALL_ROW_GROUPS option to force all rowgroups to be compressed into the columnstore.

    -- This command will force all CLOSED and OPEN rowgroups into the columnstore.  
    ALTER INDEX idx_cci_target ON cci_target   
    REORGANIZE WITH (COMPRESS_ALL_ROW_GROUPS = ON);  
    
    SELECT *   
    FROM sys.dm_db_column_store_row_group_physical_stats   
    WHERE object_id  = object_id('cci_target')  
    ORDER BY row_group_id;  
    

    结果显示 8 个 COMPRESSED 行组和 8 个 TOMBSTONE 行组。The results show 8 COMPRESSED rowgroups and 8 TOMBSTONE rowgroups. 无论行组的大小如何,都压缩到列存储中。Each rowgroup got compressed into the columnstore regardless of its size. TOMBSTONE 行组将由系统删除。The TOMBSTONE rowgroups will be removed by the system.

    TOMBSTONE 和 COMPRESSED 行组TOMBSTONE and COMPRESSED rowgroups

  6. 就查询性能而言,将小行组合并在一起会更好。For query performance, its much better to combine small rowgroups together. ALTER INDEX REORGANIZE 用于将 COMPRESSED 行组合并在一起。ALTER INDEX REORGANIZE will combine COMPRESSED rowgroups together. 将增量行组压缩到列存储中后,再次运行 ALTER INDEX REORGANIZE 以合并小的 COMPRESSED 行组。Now that the delta rowgroups are compressed into the columnstore, run ALTER INDEX REORGANIZE again to combine the small COMPRESSED rowgroups. 此时不需要 COMPRESS_ALL_ROW_GROUPS 选项。This time you don't need the COMPRESS_ALL_ROW_GROUPS option.

    -- Run this again and you will see that smaller rowgroups   
    -- combined into one compressed rowgroup with 300,000 rows  
    ALTER INDEX idx_cci_target ON cci_target REORGANIZE;  
    
    SELECT *   
    FROM sys.dm_db_column_store_row_group_physical_stats   
    WHERE object_id  = object_id('cci_target')  
    ORDER BY row_group_id;  
    

    结果显示 8 个 COMPRESSED 行组合并成一个 COMPRESSED 行组。The results show the 8 COMPRESSED rowgroups are now combined into one COMPRESSED rowgroup.

    合并行组Combined rowgroups

使用 ALTER INDEX REBUILD 对列存储索引进行离线碎片整理Use ALTER INDEX REBUILD to defragment the columnstore index offline

SQL Server 2016 (13.x)SQL Server 2016 (13.x) 及更高版本中,通常不需要重新生成列存储索引,因为 REORGANIZE 以在线操作形式在后台执行重要的重新生成。For SQL Server 2016 (13.x)SQL Server 2016 (13.x) and later, rebuilding the columnstore index is usually not needed since REORGANIZE performs the essentials of a rebuild in the background as an online operation.

重新生成列存储索引会删除碎片,并将所有行移到列存储中。Rebuilding a columnstore index removes fragmentation, and moves all rows into the columnstore. 使用 CREATE COLUMNSTORE INDEX (Transact-SQL)ALTER INDEX (Transact-SQL) 执行完全重新生成现有聚集列存储索引。Use CREATE COLUMNSTORE INDEX (Transact-SQL) or ALTER INDEX (Transact-SQL) to perform a full rebuild of an existing clustered columnstore index. 此外,也可以使用 ALTER INDEX ...REBUILD,用于重新生成特定分区。Additionally, you can use ALTER INDEX ... REBUILD to rebuild a specific partition.

重新生成过程Rebuild Process

要重新生成列存储索引, SQL ServerSQL ServerTo rebuild a columnstore index, SQL ServerSQL Server:

  1. 在重新生成进行时获取表或分区上的排他锁。Acquires an exclusive lock on the table or partition while the rebuild occurs. 数据是“离线”的,在重新生成期间不可用,即使使用 NOLOCK、RCSI 或 SI 也是如此。The data is "offline" and unavailable during the rebuild, even when using NOLOCK, RCSI, or SI.

  2. 将所有数据重新压缩到列存储中。Re-compresses all data into the columnstore. 在进行重新生成时存在列存储索引的两个副本。Two copies of the columnstore index exist while the rebuild is taking place. 在重新生成完成后, SQL ServerSQL Server 将删除原始列存储索引。When the rebuild is finished, SQL ServerSQL Server deletes the original columnstore index.

有关重新生成列存储索引的建议Recommendations for Rebuilding a Columnstore Index

重新生成列存储索引适用于删除碎片,并且适用于将所有行移到列存储中。Rebuilding a columnstore index is useful for removing fragmentation, and for moving all rows into the columnstore. 请考虑以下建议:We have the following recommendations:

  1. 重新生成分区而不是整个表。Rebuild a partition instead of the entire table.

    • 如果索引很大,并且在重新生成期间需要足够的磁盘空间来存储索引的额外副本,则重新生成整个表将很费时间。Rebuilding the entire table takes a long time if the index is large, and it requires enough disk space to store an additional copy of the index during the rebuild. 通常仅需要重新生成最近使用的分区。Usually it is only necessary to rebuild the most recently used partition.
    • 对于已分区的表,您不需要重新生成整个列存储索引,因为碎片仅可能在最近修改的分区中出现。For partitioned tables, you do not need to rebuild the entire columnstore index because fragmentation is likely to occur in only the partitions that have been modified recently. 事实表和大型的维度表通常已分区,以便对表的特定块执行备份和管理操作。Fact tables and large dimension tables are usually partitioned in order to perform backup and management operations on chunks of the table.
  2. 在执行了大量 DML 操作后重新生成分区Rebuild a partition after heavy DML operations.

    • 重新生成某一分区将会对该分区进行碎片整理,并且缩小磁盘存储空间。Rebuilding a partition will defragment the partition and reduce disk storage. 重新生成将会从列存储中删除标记为要删除的所有行,并且会将所有行组从增量存储移到列存储中。Rebuilding will delete all rows from the columnstore that are marked for deletion, and it will move all rowgroups from the deltastore into the columnstore. 请注意,增量存储中可以有多个行组,每个包含少于一百万行。Note, there can be multiple rowgroups in the deltastore that each have less than one million rows.
  3. 在加载数据后重新生成分区。Rebuild a partition after loading data.

    • 这可确保所有数据都存储于列存储中。This ensures all data is stored in the columnstore. 当并发进程在同一时间将少于 100K 行的行组加载到同一分区时,该分区可以得到多个增量存储。When concurrent processes each load less than 100K rows into the same partition at the same time, the partition can end up with multiple deltastores. 重新生成会将所有增量存储行都移到列存储中。Rebuilding will move all deltastore rows into the columnstore.

自动索引和统计信息管理Automatic index and statistics management

利用自适应索引碎片整理等解决方案,自动管理一个或多个数据库的索引碎片整理和统计信息更新。Leverage solutions such as Adaptive Index Defrag to automatically manage index defragmentation and statistics updates for one or more databases. 此过程根据碎片级别以及其他参数,自动选择是重新生成索引还是重新组织索引,并使用线性阈值更新统计信息。This procedure automatically chooses whether to rebuild or reorganize an index according to its fragmentation level, amongst other parameters, and update statistics with a linear threshold.

另请参阅See Also

列存储索引 - 新增功能 Columnstore indexes - what's new
Columnstore Indexes Query Performance Columnstore Indexes Query Performance
开始使用列存储进行实时运营分析 Get started with Columnstore for real-time operational analytics
针对数据仓库的列存储索引Columnstore Indexes for Data Warehousing
列存储索引体系结构 Columnstore Index Architecture
自适应索引碎片整理Adaptive Index Defrag