数据流任务Data Flow Task

适用对象:是SQL Server,包含 Linux 版 是Azure SQL 数据库 是Azure Synapse Analytics (SQL DW) 否并行数据仓库 APPLIES TO: yesSQL Server, including on Linux yesAzure SQL Database yesAzure Synapse Analytics (SQL DW) noParallel Data Warehouse

数据流任务封装数据流引擎,该引擎在源和目标之间移动数据,使用户可以在移动数据时转换、清除和修改数据。The Data Flow task encapsulates the data flow engine that moves data between sources and destinations, and lets the user transform, clean, and modify data as it is moved. 将数据流任务添加到包控制流使得包可以提取、转换和加载数据。Addition of a Data Flow task to a package control flow makes it possible for the package to extract, transform, and load data.

数据流至少由一个数据流组件组成,但通常由一组已连接的数据流组件组成:提取数据的源;修改、路由或汇总数据的转换;加载数据的目标。A data flow consists of at least one data flow component, but it is typically a set of connected data flow components: sources that extract data; transformations that modify, route, or summarize data; and destinations that load data.

运行时,数据流任务从数据流生成执行计划,而数据流引擎执行该计划。At run time, the Data Flow task builds an execution plan from the data flow, and the data flow engine executes the plan. 可以创建无数据流的数据流任务,但任务只有在其包含至少一个数据流时才执行。You can create a Data Flow task that has no data flow, but the task executes only if it includes at least one data flow.

若要从文本文件向 SQL ServerSQL Server 数据库大容量插入数据,可以使用大容量插入任务代替数据流任务和数据流。To bulk insert data from text files into a SQL ServerSQL Server database, you can use the Bulk Insert task instead of a Data Flow task and a data flow. 但是,大容量插入任务无法转换数据。However, the Bulk Insert task cannot transform data. 有关详细信息,请参阅 Bulk Insert TaskFor more information, see Bulk Insert Task.

多个流Multiple Flows

数据流任务可以包含多个数据流。A Data Flow task can include multiple data flows. 如果任务复制多组数据,而且复制数据的顺序并不重要,则可以更方便地在数据流任务中包含多个数据流。If a task copies several sets of data, and if the order in which the data is copied is not significant, it can be more convenient to include multiple data flows in the Data Flow task. 例如,可以创建五个数据流,每个数据流都将数据从平面文件复制到数据仓库星形架构中的不同维度表中。For example, you might create five data flows, each copying data from a flat file into a different dimension table in a data warehouse star schema.

但是,如果一个数据流任务中存在多个数据流,则由数据流引擎决定执行顺序。However, the data flow engine determines order of execution when there are multiple data flows within one data flow task. 因此,当顺序很重要时包应该使用多个数据流任务,每个任务包含一个数据流。Therefore, when order is important, the package should use multiple Data Flow tasks, each task containing one data flow. 然后可以应用优先约束来控制任务的执行顺序。You can then apply precedence constraints to control the execution order of the tasks.

以下关系图显示具有多个数据流的数据流任务。The following diagram shows a Data Flow task that has multiple data flows.

数据流Data flows

日志项Log Entries

Integration ServicesIntegration Services 提供了可用于所有任务的一组日志事件。provides a set of log events that are available to all tasks. Integration ServicesIntegration Services 还提供了可用于多个任务的自定义日志条目。also provides custom log entries to many tasks. 有关详细信息,请参阅 Integration Services (SSIS) 日志记录For more information, see Integration Services (SSIS) Logging. 数据流任务包括下列自定义日志项:The Data Flow task includes the following custom log entries:

日志项Log entry 描述Description
BufferSizeTuningBufferSizeTuning 指示数据流任务更改了缓冲区的大小。Indicates that the Data Flow task changed the size of the buffer. 日志条目描述了大小更改的原因,并列出了临时的新缓冲区大小。The log entry describes the reasons for the size change and lists the temporary new buffer size.
OnPipelinePostEndOfRowsetOnPipelinePostEndOfRowset 表示组件已经给出它的行集结束信号,该信号由对 ProcessInput 方法的最后一次调用设置。Denotes that a component has been given its end-of-rowset signal, which is set by the last call of the ProcessInput method. 对于数据流中处理输入的每个组件,都会写入一项。An entry is written for each component in the data flow that processes input. 该项包括组件的名称。The entry includes the name of the component.
OnPipelinePostPrimeOutputOnPipelinePostPrimeOutput 指示组件已经完成它对 PrimeOutput 方法的最后一次调用。Indicates that the component has completed its last call to the PrimeOutput method. 取决于数据流,可能写入多个日志条目。Depending on the data flow, multiple log entries may be written. 如果组件是源组件,此日志条目表示该组件已经完成对行的处理。If the component is a source, this log entry means that the component has finished processing rows.
OnPipelinePreEndOfRowsetOnPipelinePreEndOfRowset 指示组件将要接收它的行集结束信号,该信号由对 ProcessInput 方法的最后一次调用设置。Indicates that a component is about to receive its end-of-rowset signal, which is set by the last call of the ProcessInput method. 对于数据流中处理输入的每个组件,都会写入一项。An entry is written for each component in the data flow that processes input. 该项包括组件的名称。The entry includes the name of the component.
OnPipelinePrePrimeOutputOnPipelinePrePrimeOutput 指示组件将从 PrimeOutput 方法接收它的调用。Indicates that the component is about to receive its call from the PrimeOutput method. 取决于数据流,可能写入多个日志条目。Depending on the data flow, multiple log entries may be written.
OnPipelineRowsSentOnPipelineRowsSent 报告对 ProcessInput 方法的调用为组件输入所提供的行数。Reports the number of rows provided to a component input by a call to the ProcessInput method. 此日志条目包括组件名。The log entry includes the component name.
PipelineBufferLeakPipelineBufferLeak 提供在缓冲区管理器退出之后使缓冲区保持活动状态的任何组件的相关信息。Provides information about any component that kept buffers alive after the buffer manager goes away. 如果缓冲区仍保持活动状态,则没有释放缓冲区资源并且可能导致内存泄漏。If a buffer is still alive, buffers resources were not released and may cause memory leaks. 日志条目提供组件的名称和缓冲区的 ID。The log entry provides the name of the component and the ID of the buffer.
PipelineComponentTimePipelineComponentTime 报告组件在其五个主要处理步骤(Validate、PreExecute、PostExecute、ProcessInput 和 ProcessOutput)中的每个步骤所用的时间长短(毫秒)。Reports the amount of time (in milliseconds) that the component spent in each of its five major processing steps-Validate, PreExecute, PostExecute, ProcessInput, and ProcessOutput.
PipelineExecutionPlanPipelineExecutionPlan 报告数据流的执行计划。Reports the execution plan of the data flow. 执行计划提供有关缓冲区将如何发送到组件的信息。The execution plan provides information about how buffers will be sent to components. 此信息与 PipelineExecutionTrees 日志条目组合,一起描述在数据流任务中所发生的事情。This information, in combination with the PipelineExecutionTrees log entry, describes what is happening within the Data Flow task.
PipelineExecutionTreesPipelineExecutionTrees 报告数据流中的布局的执行树。Reports the execution trees of the layout in the data flow. 数据流引擎的计划程序使用这些树生成数据流的执行计划。The scheduler of the data flow engine uses the trees to build the execution plan of the data flow.
PipelineInitializationPipelineInitialization 提供有关任务的初始化信息。Provides initialization information about the task. 此信息包括要用来临时存储 BLOB 数据、默认缓冲区大小和缓冲区行数的目录。This information includes the directories to use for temporary storage of BLOB data, the default buffer size, and the number of rows in a buffer. 取决于数据流任务的配置,可能写入多个日志条目。Depending on the configuration of the Data Flow task, multiple log entries may be written.

每次运行包时,这些日志条目都提供有关数据流任务执行的丰富信息。These log entries provide a wealth of information about the execution of the Data Flow task each time you run a package. 重复运行包时,可以捕获信息,这些信息随着时间的变化提供有关任务所执行的处理、可能影响性能的问题以及任务处理的数据量的重要历史信息。As you run the packages repeatedly, you can capture information that over time provides important historical information about the processing that the task performs, issues that might affect performance, and the data volume that task handles.

有关如何使用这些日志条目监视和改进数据流的性能的详细信息,请参阅下列主题之一:For more information about how to use these log entries to monitor and improve the performance of the data flow, see one of the following topics:

来自数据流任务的示例消息Sample Messages From a Data Flow Task

下表列出了非常简单的包的日志条目示例消息。The following table lists sample messages for log entries for a very simple package. 该包使用 OLE DB 源来从表中提取数据,使用排序转换来排序数据,并使用 OLE DB 目标将数据写入其他表。The package uses an OLE DB source to extract data from a table, a Sort transformation to sort the data, and an OLE DB destination to writes the data to a different table.

日志项Log entry 消息Messages
BufferSizeTuningBufferSizeTuning Rows in buffer type 0 would cause a buffer size greater than the configured maximum. There will be only 9637 rows in buffers of this type.

Rows in buffer type 2 would cause a buffer size greater than the configured maximum. There will be only 9497 rows in buffers of this type.

Rows in buffer type 3 would cause a buffer size greater than the configured maximum. There will be only 9497 rows in buffers of this type.
OnPipelinePostEndOfRowsetOnPipelinePostEndOfRowset A component will be given the end of rowset signal. : 1180 : Sort : 1181 : Sort Input

A component will be given the end of rowset signal. : 1291 : OLE DB Destination : 1304 : OLE DB Destination Input
OnPipelinePostPrimeOutputOnPipelinePostPrimeOutput A component has returned from its PrimeOutput call. : 1180 : Sort

A component has returned from its PrimeOutput call. : 1 : OLE DB Source
OnPipelinePreEndOfRowsetOnPipelinePreEndOfRowset A component has finished processing all of its rows. : 1180 : Sort : 1181 : Sort Input

A component has finished processing all of its rows. : 1291 : OLE DB Destination : 1304 : OLE DB Destination Input
OnPipelinePrePrimeOutputOnPipelinePrePrimeOutput PrimeOutput will be called on a component. : 1180 : Sort

PrimeOutput will be called on a component. : 1 : OLE DB Source
OnPipelineRowsSentOnPipelineRowsSent Rows were provided to a data flow component as input. : : 1185 : OLE DB Source Output : 1180 : Sort : 1181 : Sort Input : 76

Rows were provided to a data flow component as input. : : 1308 : Sort Output : 1291 : OLE DB Destination : 1304 : OLE DB Destination Input : 76
PipelineComponentTimePipelineComponentTime The component "Calculate LineItemTotalCost" (3522) spent 356 milliseconds in ProcessInput.

The component "Sum Quantity and LineItemTotalCost" (3619) spent 79 milliseconds in ProcessInput.

The component "Calculate Average Cost" (3662) spent 16 milliseconds in ProcessInput.

The component "Sort by ProductID" (3717) spent 125 milliseconds in ProcessInput.

The component "Load Data" (3773) spent 0 milliseconds in ProcessInput.

The component "Extract Data" (3869) spent 688 milliseconds in PrimeOutput filling buffers on output "OLE DB Source Output" (3879).

The component "Sum Quantity and LineItemTotalCost" (3619) spent 141 milliseconds in PrimeOutput filling buffers on output "Aggregate Output 1" (3621).

The component "Sort by ProductID" (3717) spent 16 milliseconds in PrimeOutput filling buffers on output "Sort Output" (3719).
PipelineExecutionPlanPipelineExecutionPlan SourceThread0

Drives: 1

Influences: 1180 1291

Output Work List

CreatePrimeBuffer of type 1 for output ID 11.

SetBufferListener: "WorkThread0" for input ID 1181

CreatePrimeBuffer of type 3 for output ID 12.

CallPrimeOutput on component "OLE DB Source" (1)

End Output Work List

End SourceThread0

WorkThread0

Drives: 1180

Influences: 1180 1291

Input Work list, input ID 1181 (1 EORs Expected)

CallProcessInput on input ID 1181 on component "Sort" (1180) for view type 2

End Input Work list for input 1181

Output Work List

CreatePrimeBuffer of type 4 for output ID 1182.

SetBufferListener: "WorkThread1" for input ID 1304

CallPrimeOutput on component "Sort" (1180)

End Output Work List

End WorkThread0

WorkThread1

Drives: 1291

Influences: 1291

Input Work list, input ID 1304 (1 EORs Expected)

CallProcessInput on input ID 1304 on component "OLE DB Destination" (1291) for view type 5

End Input Work list for input 1304

Output Work List

End Output Work List

End WorkThread1
PipelineExecutionTreesPipelineExecutionTrees begin execution tree 0

output "OLE DB Source Output" (11)

input "Sort Input" (1181)

end execution tree 0

begin execution tree 1

output "OLE DB Source Error Output" (12)

end execution tree 1

begin execution tree 2

output "Sort Output" (1182)

input "OLE DB Destination Input" (1304)

output "OLE DB Destination Error Output" (1305)

end execution tree 2
PipelineInitializationPipelineInitialization No temporary BLOB data storage locations were provided. The buffer manager will consider the directories in the TEMP and TMP environment variables.

The default buffer size is 10485760 bytes.

Buffers will have 10000 rows by default

The data flow will not remove unused components because its RunInOptimizedMode property is set to false.

许多日志事件都写入多个项,并且多个日志条目的消息都包含复杂的数据。Many log events write multiple entries, and the messages for a number of log entries contain complex data. 为了便于理解和交流复杂消息的内容,可以对消息文本进行分析。To make it easier to understand and to communicate the content of complex messages you can parse the message text. 根据日志的位置,可以使用 Transact-SQL 语句或脚本组件用列或其他更有用的格式来分离复杂文本。Depending on the location of the logs, you can use Transact-SQL statements or a Script component to separate the complex text into columns or other formats that you find more useful.

例如,下表包含消息“已经为数据流组件提供了若干行作为输入。For example, the following table contains the message "Rows were provided to a data flow component as input. : :1185:OLE DB 源输出:1180:排序:1181:对输入进行排序:76”,已解析为列。: : 1185 : OLE DB Source Output : 1180 : Sort : 1181 : Sort Input : 76", parsed into columns. 该消息由 OnPipelineRowsSent 事件在将行从 OLE DB 源发送到排序转换时写入。The message was written by the OnPipelineRowsSent event when rows were sent from the OLE DB source to the Sort transformation.

“列”Column 描述Description ReplTest1Value
PathIDPathID OLE DB 源和排序转换之间的路径中的 ID 属性值。The value from the ID property of the path between the OLE DB source and the Sort transformation. 11851185
PathNamePathName 路径的 Name 属性值。The value from the Name property of the path. OLE DB 源输出OLE DB Source Output
ComponentIDComponentID 排序转换的 ID 属性值。The value of the ID property of the Sort transformation. 11801180
ComponentNameComponentName 排序转换的 Name 属性值。The value from the Name property of the Sort transformation. 排序Sort
InputIDInputID 排序转换的输入的 ID 属性值。The value from the ID property of the input to the Sort transformation. 11811181
InputNameInputName 排序转换的输入的 Name 属性值。The value from the Name property of the input to the Sort transformation. 对输入进行排序Sort Input
RowsSentRowsSent 向排序转换的输入发送的行数。The number of rows sent to the input of the Sort transformation. 7676

配置数据流任务Configuration of the Data Flow Task

可以在 “属性” 窗口中或以编程方式设置属性。You can set properties in the Properties window or programmatically.

有关如何在 “属性” 窗口中设置这些属性的详细信息,请单击下列主题:For more information about how to set these properties in the Properties window, click the following topic:

数据流任务的编程配置Programmatic Configuration of the Data Flow Task

有关以编程方式向包中添加数据流任务并设置数据流属性的详细信息,请单击下列主题:For more information about programmatically adding a data flow task to a package and setting data flow properties, click the following topic:

设置任务或容器的属性Set the Properties of a Task or Container

technet.microsoft.com 上的视频 平衡的数据分发服务器Video, Balanced Data Distributer, on technet.microsoft.com.