CDC 流程元件CDC Flow Components

Change Data Capture Components by Attunity for Microsoft SQL Server 2017 Integration Services (SSIS)SQL Server 2017 Integration Services (SSIS) 可協助 SSIS 開發人員使用 CDC 並且降低 CDC 封裝的複雜性。The Change Data Capture Components by Attunity for Microsoft SQL Server 2017 Integration Services (SSIS)SQL Server 2017 Integration Services (SSIS) help SSIS developers work with CDC and reduce the complexity of CDC packages.

SSIS CDC 元件的設計目的是要搭配來源資料表為相同 SQL Server 2017SQL Server 2017 資料庫或 Oracle 資料庫 (使用 SQL Server 2017SQL Server 2017 的 Oracle CDC 服務時) 的 SQL Server 2017SQL Server 2017CDC 功能運作。The SSIS CDC components are designed to work with the SQL Server 2017SQL Server 2017 CDC feature where the source tables are either the same SQL Server 2017SQL Server 2017 database or an Oracle database (when using the Oracle CDC Service for SQL Server 2017SQL Server 2017). 支援分割資料表。Partitioned tables are supported.

這些元件包括控制和資料流程元件,可簡化在 SSIS 封裝中讀取和處理變更資料的體驗。The components include Control and Data Flow components that streamline the experience of reading and processing change data in SSIS packages. 這些元件可以加入至 Microsoft SQL Server 2017SQL Server 2017的元件程式庫,但是必須另外安裝。The components can be added to the component library in Microsoft SQL Server 2017SQL Server 2017, but are installed separately.

Change Data Capture Components by Attunity 如下所示:The following are the Change Data Capture Components by Attunity:

CDC 控制流程元件CDC Control Flow Component:

CDC 控制工作CDC Control Task

CDC 資料流程元件CDC Data Flow Components:

CDC 來源CDC Source

CDC 分隔器CDC Splitter

安裝Installation

本節描述 Microsoft SQL Server 2017 Integration Services (SSIS)SQL Server 2017 Integration Services (SSIS)CDC 元件的安裝程序。This section describes the installation procedures for the CDC Components for Microsoft SQL Server 2017 Integration Services (SSIS)SQL Server 2017 Integration Services (SSIS).

SSIS 的 CDC 元件隨附於 Attunity 所提供適用於 Microsoft SQL Server® 的 Microsoft® Change Data Capture Designer for Oracle 和 Change Data Capture Service for Oracle。The CDC Components for SSIS are packaged with the Microsoft® Change Data Capture Designer and Service for Oracle by Attunity for Microsoft SQL Server®. 此下載是 SQL Server Feature Pack 的一部分。This download is part of the SQL Server Feature Pack. SQL Server 2016 Feature Pack 網頁下載 Feature Pack 的元件。Download components of the Feature Pack from the SQL Server 2016 Feature Pack web page.

版本支援Version Support

SQL Server 版本支援SQL Server version support

所有支援的 Microsoft SQL Server 版本都支援 SSIS 的 CDC 元件。The CDC components for SSIS are supported on all the supported versions of Microsoft SQL Server. 目前,支援的 SQL Server 版本包括 SQL Server 2012 至 SQL Server 2017。Currently, the supported versions of SQL Server include SQL Server 2012 through SQL Server 2017.

作業系統版本支援Operating system version support

下列作業系統和平台支援 SSIS 的 CDC 元件:The CDC components for SSIS are supported on the following operating systems and platforms:

  • Windows 8 和 8.1Windows 8 and 8.1
  • Windows 10Windows 10
  • Windows Server 2012 和 2012 R2Windows Server 2012 and 2012 R2
  • Windows Server 2016Windows Server 2016

執行安裝程式Running the Installation Program

執行安裝精靈之前,請確定 SQL Server 2017SQL Server 2017 SQL Server Data ToolsSQL Server Data Tools 已關閉。Before you run the installation wizard, be sure that the SQL Server 2017SQL Server 2017 SQL Server Data ToolsSQL Server Data Tools is closed. 然後,遵循安裝精靈中的指示進行。Then follow the directions in the installation wizard.

重新啟動 SSIS 服務Restart SSIS Service

安裝 CDC 元件之後,您必須重新啟動 SSIS 服務,才能確保在 SQL SQL Server Data ToolsSQL Server Data Tools 中開發封裝時,這些元件可正確運作。After you install the CDC components, you must restart the SSIS service to be sure that the components work correctly when developing packages in the SQL SQL Server Data ToolsSQL Server Data Tools.

安裝元件之後,系統會顯示一則訊息。A message is displayed after you install the components. 出現提示時,請按一下 [是]。Click Yes when prompted.

解除安裝 Microsoft CDC 元件Uninstalling the Microsoft CDC Components

您可以使用解除安裝精靈來解除安裝 CDC 來源、CDC 分隔器或 CDC 控制工作。You uninstall the CDC source, CDC splitter, or CDC Control task, by using the uninstall wizard. 如果您正在使用 SQL Server 2017SQL Server 2017 SQL Server Data ToolsSQL Server Data Tools 進行封裝開發,請確定 SQL Server Data ToolsSQL Server Data Tools 已關閉,然後再執行解除安裝精靈。If you are using the SQL Server 2017SQL Server 2017 SQL Server Data ToolsSQL Server Data Tools for package development, make sure the SQL Server Data ToolsSQL Server Data Tools is closed before running the uninstall wizard.

優點Benefits

SQL Server 2017SQL Server 2017 Integration ServicesIntegration Services 元件的 CDC 元件可讓 SSIS 開發人員輕鬆地建置處理變更資料的 SSIS 封裝。The CDC Components for SQL Server 2017SQL Server 2017 Integration ServicesIntegration Services components allow SSIS developers to easily build SSIS packages that process change data. 這些元件可強化 SSIS 開發人員處理 CDC 的能力,並且降低 CDC 封裝的複雜性。These components enhance the ability of SSIS developers to deal with CDC and reduce the complexity of CDC packages.

SSIS CDC 元件是用來以方便進一步處理的方式提供變更資料,以便複寫、載入資料倉儲、更新 OLAP 的緩時變維度、稽核變更或進行其他可能的用途。The SSIS CDC components are used to provide the change data in a way that is easy to further process it for replication, loading a data warehouse, updating slowly changing dimensions for OLAP, auditing changes, or for additional possible uses. 所使用的進一步處理類型是由 SSIS 開發人員決定。The type of further processing used is determined by the SSIS developer.

SSIS CDC 元件的設計目的是要搭配變更資料表位於相同 SQL Server 2017SQL Server 2017 資料庫中的 SQL Server 2017SQL Server 2017 CDC 功能運作。The SSIS CDC components are designed to work with the SQL Server 2017SQL Server 2017 CDC feature with change tables that are in the same SQL Server 2017SQL Server 2017 database.

開始使用異動資料擷取元件Getting Started with the Change Data Capture Components

一般的 CDC 封裝會處理一組資料表的變更。A typical CDC package processes changes to a group of tables. 下圖顯示這種 CDC 封裝類型的基本控制流程部分。The basic control flow part of this type of CDC package is shown in the following figure. 這種封裝稱為 Trickle 摘要處理封裝。This package is called a trickle-feed processing package.

Trickle 摘要處理封裝控制流程Trickle Feed Processing Package Control Flow

這個 SQL Server 2017SQL Server 2017 Integration ServicesIntegration Services 控制流程包含兩項 CDC 控制工作和資料流程工作。This SQL Server 2017SQL Server 2017 Integration ServicesIntegration Services Control Flow contains two CDC Control Tasks and the Data Flow task. 名為「取得 CDC 處理範圍」的第一項工作會針對在名為「處理變更」之資料流程工作中處理的變更建立 LSN 範圍。The first task called Get CDC Processing Range establishes the LSN range for the changes that are processed in the data-flow task called Process Changes. 這個範圍是根據最後一個封裝執行期間所處理的內容以及儲存在永續性存放區中的內容所建立。This range is established based on what was processed during the last package run and was saved in a persistent store.

如需使用 CDC 控制工作的詳細資訊,請參閱 CDC 控制工作CDC 控制工作編輯器For more information about using the CDC Control Task, see CDC Control Task and CDC Control Task Editor.

下圖顯示「處理變更」資料流程,就概念上說明變更的處理方式。The following figure shows the Process Changes data flow, which conceptually shows how the changes are processes.

處理變更資料流程Process Changes Data Flow

此圖所說明的步驟如下:The steps illustrated in this figure are:

  • 資料表 X 的變更是讀取對資料表 X 所做變更的 CDC 來源,這些變更是在父控制流程中決定的 CDC 處理範圍內進行。Changes for Table X is a CDC source that reads changes made to table X that were made in the CDC processing range determined in the parent control flow.

  • CDC 分隔器 X 是用來將變更分割成插入、刪除和更新。CDC Splitter X is used to split the changes into inserts, deletes, and updates. 此案例會假設 CDC 來源設定成產生淨變更,以便以平行方式處理不同的變更類型。In this scenario, it is assumed that the CDC Source is configured to produce Net changes so that different change types can be processed in parallel.

  • 然後,在下游進一步處理特定變更。The specific changes are then further processed downstream. 在本圖中,這些變更會插入使用多個 ODBC 目的地的資料表,但在實際的情況中,處理方式可能有所不同。In this illustration, the changes are inserted into tables using multiple ODBC Destinations but in actual cases the processing may be different.

    如需有關 CDC 來源的詳細資訊,請參閱:For more information about the CDC Source, see:

    CDC 來源CDC Source

    CDC 來源編輯器 (連線管理員頁面)CDC Source Editor (Connection Manager Page)

    CDC 來源編輯器 (資料行頁面)CDC Source Editor (Columns Page)

    CDC 來源編輯器 (錯誤輸出頁面)CDC Source Editor (Error Output Page)

    如需有關 CDC 分隔器的詳細資訊,請參閱:For more information about the CDC Splitter, see:

    CDC 分隔器CDC Splitter

    建置 CDC 封裝時需要注意的其中一個基本問題是,變更處理如何與資料的初始載入 (或初始處理) 互動。One of the basic issues that require attention when building CDC packages is how the change processing interacts with the initial loading (or initial processing) of the data.

    CDC 元件支援三種相異初始載入和變更處理案例:The CDC components support three distinct initial loading and change processing scenarios:

  • 透過資料庫快照集完成初始載入。Initial loading done with a database snapshot. 在此情況中,變更處理會從快照集事件的 LSN 開始進行。In this case, change processing starts with the LSN of the snapshot event.

  • 從靜止資料庫進行初始載入。Initial loading from a quiescent database. 在此情況中,初始載入期間不會進行任何變更,因此系統會在初始載入期間的某個時間點對目前 LSN 進行取樣,而且變更處理會從該 LSN 開始進行。In this case, no changes are made during initial loading so the current LSN is sampled at sometime during the initial load and change processing starts with that LSN.

  • 從使用中資料庫進行初始載入。Initial loading from an active database. 在此情況中,當初始載入正在進行時,系統會對資料庫進行變更,而且沒有能夠據以精確啟動變更處理的單一 LSN。In this case, as the initial load is in progress, changes are made to the database and there is no single LSN from which change processing can be precisely started. 在此情況中,初始載入封裝開發人員可以在初始載入前後對來源資料庫的目前 LSN 進行取樣。In this case, the initial load package developer can sample the source database current LSN before and after the initial load. 然後,在處理變更時,應該小心處理以平行方式對初始載入所做的變更,因為某些處理的變更已經顯示在初始載入中 (例如,插入變更可能失敗並發生重複索引鍵錯誤,因為插入的資料列已由初始載入程序讀取)。Then, when processing changes, care should be taken when processing changes made in parallel to the initial load as some of the processed changes are already seen in the initial load (for example, an Insert change may fail with a duplicate key error because the inserted row was read by the initial load process).

    下圖顯示可處理前兩種案例的 SSIS 封裝:The following figure shows an SSIS package that could handle the first two scenarios:

    SSIS 封裝處理前兩個案例SSIS package handling first two scenarios

    下圖顯示可處理第三種案例的 SSIS 封裝:The following figure shows an SSIS package that could handle the third scenario:

    SSIS 封裝處理第三個案例SSIS package handling third scenario

    初始載入封裝之後,Trickle 摘要更新封裝會根據處理變更的排程重複執行,因為它們變成可供取用。Following the initial load package, a trickle-feed update package is run repeatedly according to a schedule to process changes as they become available for consumption.

    將 CDC 處理的狀態從初始載入封裝傳遞至 Trickle 摘要封裝以及在每個封裝內不同工作之間傳遞的作業是經由特殊的 SSIS 封裝字串變數進行。Passing the state of the CDC processing from the initial load package to the trickle feed package and between different tasks within each package occurs by means of a special SSIS package string variable. 此變數的值稱為 CDC 狀態,它會針對初始載入和 Trickle 摘要封裝所處理的資料表群組反映 CDC 處理的目前狀態。The value of this variable is referred to as the CDC State, which reflects the current state of CDC processing for the groups of tables being handled by the initial load and trickle-feed packages.

    必須在永續性儲存體中維護 CDC 狀態變數的值。The value of the CDC State variable needs to be maintained in persistent storage. 它應該在啟動 CDC 處理之前讀取,而且應該在處理完成之後儲存目前的狀態。It should be read before starting CDC processing and should be saved with the current state after processing completed. 雖然 SSIS 開發人員可以處理 CDC 狀態的載入和儲存工作,不過 CDC 控制元件可以在資料庫資料表中維護 CDC 狀態值,藉以自動化這項工作。The task of loading and storing of the CDC state can be handled by the SSIS developer but the CDC Control component can automate this task by maintaining the CDC State value in a database table.

安全性考量Security Considerations

本節列出一些在 SSIS 中使用 CDC 元件的相關安全性考量。This section lists some security considerations related to the use of the CDC components in SSIS.

變更資料的存取授權Access Authorization to Change Data

Trickle 摘要更新封裝需要 SQL Server 2017SQL Server 2017 CDC 功能的存取權。Trickle-feed update packages need access to SQL Server 2017SQL Server 2017 CDC functions. 根據預設,這類存取權會授與 db_owner 固定資料庫角色的成員。Such access is granted, by default, to members of the db_owner fixed database role. 因為 db_owner 是功能強大的角色,所以在 SQL Server 2017SQL Server 2017 中定義擷取執行個體時,建議您將控制安全性角色與每個擷取執行個體產生關聯,讓 SSIS CDC 封裝使用更受限制的使用者來處理變更。Because the db_owner is a powerful role, when defining capture instances within SQL Server 2017SQL Server 2017 it is recommended to associate a gating security role to each capture instance that allows the SSIS CDC package to use a much more restricted user for processing the changes.

CDC 資料庫目前 LSN 的存取權Access to CDC Database Current LSN

針對變更處理標記啟始 LSN 的 CDC 控制工作作業必須能夠尋找 CDC 資料庫目前 LSN。The CDC Control task operations for marking the start LSN for change processing must be able to find the CDC Database current LSN. 這些元件是使用 master 資料庫的 sp_replincrementlsn 程序來尋找 LSN。The components find the LSN by using the procedure sp_replincrementlsn from the master database. 您必須將此程序的執行權限提供給用於連接至 SQL Server 2017SQL Server 2017 CDC 資料庫的登入。Execute permission on this procedure must be given to the login used for connecting to the SQL Server 2017SQL Server 2017 CDC database.

CDC 狀態資料表的存取權Access to CDC States Table

CDC 狀態資料表是用於自動保存 CDC 狀態,而用於連接至 SQL Server 2017SQL Server 2017 CDC 資料庫的登入必須能夠更新這些狀態。The CDC States table is used for automatically persisting CDC States that need to be updatable by the login used for connecting to the SQL Server 2017SQL Server 2017 CDC database. 因為這個資料表是由 SSIS 開發人員所建立,所以請將 SQL Server 2017SQL Server 2017 系統管理員設定為獲授權可建立 SQL Server 2017SQL Server 2017 資料庫並執行管理和維護工作的使用者。As this table is created by the SSIS developer, set the SQL Server 2017SQL Server 2017 system administrator as a user who is authorized to create SQL Server 2017SQL Server 2017 databases and perform administrative and maintenance tasks. 此外,使用啟用 CDC 之資料庫的 SQL Server 2017SQL Server 2017 系統管理員必須充分了解 SQL Server 2017SQL Server 2017 CDC 技術和實作。In addition, a SQL Server 2017SQL Server 2017 system administrator who works with CDC enabled databases must be knowledgeable about SQL Server 2017SQL Server 2017 CDC technology and implementation.

將資料表分組以進行 CDC 處理Grouping Tables for CDC Processing

資料庫專案大小的範圍從許多資料表到數千個資料表不等。Database projects range in size from several tables to many thousands of tables. 設計初始載入和 CDC 封裝時,建議您將資料表分組成較小的群組以方便管理並提高效率。When designing initial load and CDC packages, it is beneficial to group tables in much smaller groups for easier management and efficiency. 本節列出將資料表排序成小型群組時可能會影響的各種考量 (一開始載入每個群組中的資料表,然後再當做群組來更新)。This section lists various considerations that impact the sorting of tables into small groups, where the tables in each are initially loaded and then updated as a group.

CDC 元件所支援的 CDC 模式會假設已經決定此群組。The CDC patterns supported by the CDC components assume that this grouping is already determined. 每個群組都會定義不同的 CDC 內容,並且與其他群組分開維護內容。Each group defines a separate CDC context that is maintained separately from other groups. 每個群組都會建立初始載入和 Trickle 摘要更新封裝。For each group, initial-load and trickle-feed update packages are created. Trickle 摘要更新是根據變更處理條件約束的比率 (例如 CPU 和 IO 耗用量,對其他系統的影響) 以及所需的延遲,排程為定期執行。Trickle-feed updates are scheduled for periodic runs based on the rate of change processing constraints (for example, CPU and IO consumption, impact on other systems) and the desired latency.

資料表是根據下列考量進行分組:Tables are grouped based on the following considerations:

  1. 根據目標資料庫。According to the target database. 寫入不同目標資料庫或進行不同處理的所有資料表都應該指派給不同的 CDC 群組。All tables that are written to different target databases or undergo different processing should be assigned to different CDC groups.

  2. 與參考完整性條件約束相關的資料表應該指派給相同的群組,避免目標發生參考完整性問題。Tables that are related with referential integrity constraints should be assigned to the same group to avoid referential integrity problems at the target.

  3. 可容許較高延遲的資料表可以組成群組,以便降低處理這些資料表的頻率並且減少整體系統負載。Tables for which higher latency can be tolerated can be grouped so they can be processed less frequently and reduce overall system load.

  4. 變動率較高的資料表應該位於較小的群組中,而變動率較低的資料表則可組成較大的群組。Tables for which there is a higher rate of change should be in smaller groups, and tables with a low rate of change can be grouped in larger groups.

    下列兩種封裝是針對每個 CDC 群組建立的:The following two packages are created for each CDC group:

  • 初始載入封裝,它會從來源資料表讀取完整的資料範圍並且套用至目標資料表。An Initial Load package, which reads the entire range of data from the source tables and applies it to the target tables.

  • Trickle 摘要更新封裝,它會讀取對來源資料表所做的變更並且將變更套用至目標資料表。A trickle-feed update package that reads changes made to the source tables and applies the changes to the target tables. 這個封裝應該定期執行。This package should be executed on a regularly scheduled basis.

CDC 狀態CDC State

每個 CDC 群組都具有相關聯的狀態,由特定格式的字串表示。Each CDC group has a state associated with it, which is represented by a string with a specific format. 如需詳細資訊,請參閱 CDC 控制工作For more information, see CDC Control Task. 下表顯示可能的 CDC 狀態值。The following table shows the possible CDC state values.

StateState 描述Description
0-(INITIAL)0-(INITIAL) 在目前 CDC 群組上執行任何封裝之前就存在的狀態。The state that exists before any packages are run on the current CDC group. 這也是 CDC 狀態為空白時呈現的狀態。This is also the state when the CDC state is empty.

如需 CDC 控制工作作業的詳細資訊,請參閱 CDC 控制工作For more information about CDC Control task operations, see CDC Control Task.
1-ILSTART (初始載入開始)1-ILSTART (Initial-Load-Started) 這是初始載入封裝啟動時存在的狀態。This is the state that exists when the initial load package starts. 這個狀態會在 CDC 控制工作的 MarkInitialLoadStart 作業呼叫之後出現。This occurs after the MarkInitialLoadStart operation call to the CDC Control task.

如需 CDC 控制工作作業的詳細資訊,請參閱 CDC 控制工作For more information about CDC Control task operations, see CDC Control Task.
2-ILEND (初始載入結束)2- ILEND (Initial-Load-Ended) 這是初始載入封裝順利結束時存在的狀態。This is the state that exists when the initial load package ends successfully. 這個狀態會在 CDC 控制工作的 MarkInitialLoadEnd 作業呼叫之後出現。This occurs after the MarkInitialLoadEnd operation call to the CDC Control task.

如需 CDC 控制工作作業的詳細資訊,請參閱 CDC 控制工作For more information about CDC Control task operations, see CDC Control Task.
3-ILUPDATE (初始載入更新)3-ILUPDATE (Initial Load Update) 這是在初始載入之後仍在處理初始處理範圍時第一次執行更新封裝之後存在的狀態。This is the state that exists after the first run of the Update package after the initial load while still processing the initial processing range. 這個狀態會在 CDC 控制工作的 GetProcessingRange 作業呼叫之後出現。This occurs after the GetProcessingRange operation call to the CDC control task.

如果正在使用 _$reprocessing 資料行,它就會設定為 1,表示封裝可能要重新處理已經位於目標上的資料列。If using the _$reprocessing column, it is set to 1 to indicate that the package may be reprocessing rows already at the target.

如需 CDC 控制工作作業的詳細資訊,請參閱 CDC 控制工作For more information about CDC Control task operations, see CDC Control Task.
4-TFEND (Trickle 摘要更新結束)4-TFEND (Trickle-Feed-Update-Ended) 這是一般 CDC 回合所預期的狀態。This is the state expected for regular CDC runs. 這種狀態表示上一個回合已順利完成,而且可以啟動具有新處理範圍的新回合。It indicates that the previous run completed successfully and that a new run with a new processing range can be started.
5-TFSTART (Trickle 摘要更新開始)5-TFSTART (Trickle-Feed-Update-Started) 這是在 CDC 控制工作的 GetProcessingRange 作業呼叫之後,後續執行更新封裝時存在的狀態。This is the state that exists on subsequent runs of the Update package after the GetProcessingRange operation call to the CDC control task.

這種狀態表示一般 CDC 回合已啟動,但是沒有完成或者尚未全部完成 (MarkProcessedRange)。This indicates that a regular CDC run is started, but is not finished or has not yet finished, cleanly (MarkProcessedRange).

如需 CDC 控制工作作業的詳細資訊,請參閱 CDC 控制工作For more information about CDC Control task operations, see CDC Control Task.
6-TFREDO (重新處理 Trickle 摘要更新)6-TFREDO (Reprocessing-Trickle-Feed-Updates) 這是在 TFSTART 之後發生 GetProcessingRange 時的狀態。This is the state on a GetProcessingRange that occurs after TFSTART. 這種狀態表示上一個回合並未順利完成。This indicates that the previous run did not complete successfully.

如果正在使用 $reprocessing 資料行,它就會設定為 1,表示封裝可能要重新處理已經位於目標上的資料列。If using the $reprocessing column, it is set to 1 to indicate that the package may be reprocessing rows already at the target.
7-ERROR7-ERROR CDC 群組處於 ERROR 狀態。The CDC group is in an ERROR state.

以下是 CDC 元件的狀態圖表。Here is the state diagram for the CDC components. 達到非預期的狀態時,就會達到 ERROR 狀態。An ERROR state is reached when a state is reached that is not expected. 預期的狀態如下列圖表所示。The expected states are illustrated in the following diagram. 不過,此圖表不會顯示錯誤狀態。However the diagram does not show the ERROR state.

例如,在初始載入封裝的結尾,嘗試將狀態設定為 ILEND 時,如果狀態為 TFSTART,則 CDC 群組就會處於錯誤狀態,而且 Trickle 摘要更新封裝不會執行 (初始載入封裝會執行)。For example, at the end of an initial load package, when trying to set the state to ILEND, if the state is TFSTART then the CDC group is in an error state and the Trickle-Feed Update package does not run (the Initial Load package does run).

狀態圖表State Diagram

一旦初始載入封裝順利執行之後,Trickle 摘要更新封裝就會依照預先決定的排程重複執行,以便處理來源資料表的變更。Once the Initial Load package runs successfully, the Trickle-Feed Update package runs repeatedly under a predetermined schedule to process changes to the source tables. Trickle 摘要更新封裝的每個回合都是 CDC 回合。Each run of the Trickle-Feed Update package is a CDC run.

本節內容In This Section

另請參閱See Also

CDC 控制工作CDC Control Task