Improving the Performance of the Data Flow
This topic provides suggestions about how to design Integration Services packages to avoid common performance issues. This topic also provides information about features and tools that you can use to troubleshoot the performance of packages.
Configuring the Data Flow
To configure the Data Flow task for better performance, you can configure the task's properties, adjust buffer size, and configure the package for parallel execution.
Configure the Properties of the Data Flow Task
The properties discussed in this section must be set separately for each Data Flow task in a package.
You can configure the following properties of the Data Flow task, all of which affect performance:
Specify the locations for temporary storage of buffer data (BufferTempStoragePath property) and of columns that contain binary large object (BLOB) data (BLOBTempStoragePath property). By default, these properties contain the values of the TEMP and TMP environment variables. You might want to specify other folders to put the temporary files on a different or faster hard disk drive, or to spread them across multiple drives. You can specify multiple directories by delimiting the directory names with semicolons.
Define the default size of the buffer that the task uses, by setting the DefaultBufferSize property, and define the maximum number of rows in each buffer, by setting the DefaultBufferMaxRows property. The default buffer size is 10 megabytes, with a maximum buffer size of 100 megabytes. The default maximum number of rows is 10,000.
Set the number of threads that the task can use during execution, by setting the EngineThreads property. This property provides a suggestion to the data flow engine about the number of threads to use. The default is 5, with a minimum value of 3. However, the engine will not use more threads than it needs, regardless of the value of this property. The engine may also use more threads than specified in this property, if necessary to avoid concurrency issues.
Indicate whether the Data Flow task runs in optimized mode (RunInOptimizedMode property). Optimized mode improves performance by removing unused columns, outputs, and components from the data flow.
A property with the same name, RunInOptimizedMode, can be set at the project level in Business Intelligence Development Studio to indicate that the Data Flow task runs in optimized mode during debugging. This project property overrides the RunInOptimizedMode property of Data Flow tasks at design time.
Adjust the Sizing of Buffers
The data flow engine begins the task of sizing its buffers by calculating the estimated size of a single row of data. Then it multiplies the estimated size of a row by the value of DefaultBufferMaxRows to obtain a preliminary working value for the buffer size.
If the result is more than the value of DefaultBufferSize, the engine reduces the number of rows.
If the result is less than the internally-calculated minimum buffer size, the engine increases the number of rows.
If the result falls between the minimum buffer size and the value of DefaultBufferSize, the engine sizes the buffer as close as possible to the estimated row size times the value of DefaultBufferMaxRows.
When you begin testing the performance of your data flow tasks, use the default values for DefaultBufferSize and DefaultBufferMaxRows. Enable logging on the data flow task, and select the BufferSizeTuning event to see how many rows are contained in each buffer.
Before you begin adjusting the sizing of the buffers, the most important improvement that you can make is to reduce the size of each row of data by removing unneeded columns and by configuring data types appropriately.
When sufficient memory is available, you should use a smaller number of large buffers, rather than a larger number of small buffers. In other words, you can improve performance by reducing the total number of buffers required to hold your data, and by fitting as many rows of data into a buffer as possible. To determine the optimum number of buffers and their size, experiment with the values of DefaultBufferSize and DefaultBufferMaxRows while monitoring performance and the information reported by the BufferSizeTuning event.
Do not increase buffer size to the point where paging to disk starts to occur. Paging to disk hinders performance more than a buffer size that has not been optimized. To determine whether paging is occurring, monitor the "Buffers spooled" performance counter in the Performance snap-in of the Microsoft Management Console (MMC).
Configure the Package for Parallel Execution
Parallel execution improves performance on computers that have multiple physical or logical processors. To support parallel execution of different tasks in the package, Integration Services uses two properties: MaxConcurrentExecutables and EngineThreads.
The MaxConcurrentExcecutables Property
The MaxConcurrentExecutables property is a property of the package itself. This property defines how many tasks can run simultaneously. The default value is -1, which means the number of physical or logical processors plus 2.
To understand how this property works, consider a sample package that has three Data Flow tasks. If you set MaxConcurrentExecutables to 3, all three Data Flow tasks can run simultaneously. However, assume that each Data Flow task has 10 source-to-destination execution trees. Setting MaxConcurrentExecutables to 3 does not ensure that the execution trees inside each Data Flow task run in parallel.
The EngineThreads Property
The EngineThreads property is a property of each Data Flow task. This property defines how many threads the data flow engine can create and run in parallel. The EngineThreads property applies equally to both the source threads that the data flow engine creates for sources and the worker threads that the engine creates for transformations and destinations. Therefore, setting EngineThreads to 10 means that the engine can create up to ten source threads and up to ten worker threads.
To understand how this property works, consider the sample package with three Data Flow tasks. Each of Data Flow task contains ten source-to-destination execution trees. If you set EngineThreads to 10 on each Data Flow task, all 30 execution trees can potentially run simultaneously.
A discussion of threading is beyond the scope of this topic. However, the general rule is not to run more threads in parallel than the number of available processors. Running more threads than the number of available processors can hinder performance because of the frequent context-switching between threads.
Configuring Individual Data Flow Components
To configure individual data flow components for better performance, there are some general guidelines that you can follow. There are also specific guidelines for each type of data flow component: source, transformation, and destination.
Regardless of the data flow component, there are two general guidelines that you should follow to improve performance: optimize queries and avoid unnecessary strings.
A number of data flow components use queries, either when they extract data from sources, or in lookup operations to create reference tables. The default query uses the SELECT * FROM <tableName> syntax. This type of query returns all the columns in the source table. Having all the columns available at design time makes it possible to choose any column as a lookup, pass-through, or source column. However, after you have selected the columns to be used, you should revise the query to include only those selected columns. Removing superfluous columns makes the data flow in a package more efficient because fewer columns create a smaller row. A smaller row means that more rows can fit into one buffer, and the less work it is to process all the rows in the dataset.
To construct a query, you can type the query or use Query Builder.
When you run a package in Business Intelligence Development Studio, the Progress tab of SSIS Designer lists warnings. These warnings include identifying any data column that a source makes available to the data flow, but is not subsequently used by downstream data flow components. You can use the RunInOptimizedMode property to remove these columns automatically.
Avoid Unnecessary Sorting
Sorting is inherently a slow operation, and avoiding unnecessary sorting can enhance the performance of the package data flow.
Sometimes the source data has already been sorted before being used by a downstream component. Such pre-sorting can occur when the SELECT query used an ORDER BY clause or when the data was inserted into the source in sorted order. For such pre-sorted source data, you can provide a hint that the data is sorted, and thereby avoid the use of a Sort transformation to satisfy the sorting requirements of certain downstream transformations. (For example, the Merge and Merge Join transformations require sorted inputs.) To provide a hint that the data is sorted, you have to do the following tasks:
Set the IsSorted property on the output of an upstream data flow component to True.
Specify the sort key columns on which the data is sorted.
For more information, see How to: Sort Data for the Merge and Merge Join Transformations.
If you have to sort the data in the data flow, you can improve performance by designing the data flow to use as few sort operations as possible. For example, the data flow uses a Multicast transformation to copy the dataset. Sort the dataset once before the Multicast transformation runs, instead of sorting multiple outputs after the transformation.
OLE DB Source
When you use an OLE DB source to retrieve data from a view, select "SQL command" as the data access mode and enter a SELECT statement. Accessing data by using a SELECT statement performs better than selecting "Table or view" as the data access mode.
Use the suggestions in this section to improve the performance of the Aggregate, Fuzzy Lookup, Fuzzy Grouping, Lookup, Merge Join, and Slowly Changing Dimension transformations.
The Aggregate transformation includes the Keys, KeysScale, CountDistinctKeys, and CountDistinctScale properties. These properties improve performance by enabling the transformation to preallocate the amount of memory that the transformation needs for the data that the transformation caches. If you know the exact or approximate number of groups that are expected to result from a Group by operation, set the Keys and KeysScale properties, respectively. If you know the exact or approximate number of distinct values that are expected to result from a Distinct count operation, set the CountDistinctKeys and CountDistinctScale properties, respectively.
If you have to create multiple aggregations in a data flow, consider creating multiple aggregations that use one Aggregate transformation instead of creating multiple transformations. This approach improves performance when one aggregation is a subset of another aggregation because the transformation can optimize internal storage and scan incoming data only once. For example, if an aggregation uses a GROUP BY clause and an AVG aggregation, combining them into one transformation can improve performance. However, performing multiple aggregations within one Aggregate transformation serializes the aggregation operations, and therefore might not improve performance when multiple aggregations must be computed independently.
For more information, see Aggregate Transformation.
Fuzzy Lookup and Fuzzy Grouping Transformations
For information about optimizing the performance of the Fuzzy Lookup and Fuzzy Grouping transformations, see the white paper, Fuzzy Lookup and Fuzzy Grouping in SQL Server Integration Services 2005.
Minimize the size of the reference data in memory by entering a SELECT statement that looks up only the columns that you need. This option performs better than selecting an entire table or view, which returns a large amount of unnecessary data.
Merge Join Transformation
The Merge Join transformation includes the MaxBuffersPerInput property, which specifies the maximum number of buffers that can be active for each input at one time. You can use this property to tune the amount of memory that the buffers consume, and consequently, the performance of the transformation. The larger the number of buffers, the more memory the transformation uses and the better the performance is. The default value of MaxBuffersPerInput is 5, which is the number of buffers that works well in most scenarios. To tune performance, you may want to try using a slightly different number of buffers such as 4 or 6. If possible, you should avoid using a very small number of buffers. For example, setting MaxBuffersPerInput to 1 instead of 5 has a significant impact on performance. Also, you should not set MaxBuffersPerInput to 0 or less. This range of values means that no throttling occurs, and depending on the data load and the amount of memory available, the package may not complete.
To avoid a deadlock, the Merge Join transformation may temporarily increase the number of buffers it uses beyond the value of MaxBuffersPerInput. After the deadlock condition is resolved, MaxBuffersPerInput returns to its configured value.
For more information, see Merge Join Transformation.
Slowly Changing Dimension Transformation
The Slowly Changing Dimension Wizard and the Slowly Changing Dimension transformation are general-purpose tools that meet the needs of most users. However, the data flow that the wizard generates is not optimized for performance.
Typically, the slowest components in the Slowly Changing Dimension transformation are the OLE DB Command transformations that perform UPDATEs against a single row at a time. Therefore, the most effective way to improve the performance of the Slowly Changing Dimension transformation is to replace the OLE DB Command transformations. You can replace these transformations with destination components that save all rows to be updated to a staging table. Then, you can add an Execute SQL task that performs a single set-based Transact-SQL UPDATE against all rows at the same time.
Advanced users can design a custom data flow for slowly changing dimension processing that is optimized for large dimensions. For a discussion and example of this approach, see the section, "Unique dimension scenario," in the white paper, Project REAL: Business Intelligence ETL Design Practices.
To achieve better performance with destinations, consider using a SQL Server destination and testing the destination's performance.
SQL Server Destination
When a package loads data to an instance of SQL Server on the same computer, use a SQL Server destination. This destination is optimized for high-speed bulk loads.
Test the Performance of Destinations
You may find that saving data to destinations takes more time than expected. To identify whether the slowness is caused by the inability of the destination to process data quickly enough, you can temporarily replace the destination with a Row Count transformation. If the throughput improves significantly, it is likely that the destination that is loading the data is causing the slowdown.
Monitoring the Performance of the Package
Integration Services includes tools and features that you can use to monitor the performance of a package. For example, logging captures run-time information about a package, and performance counters let you monitor the data flow engine. Use the following suggestions to determine which parts of the package are having the greatest effect on performance.
Review the Information on the Progress Tab
SSIS Designer provides information about both control flow and data flow when you run a package in Business Intelligence Development Studio. The Progress tab lists tasks and containers in order of execution and includes start and finish times, warnings, and error messages for each task and container, including the package itself. It also lists data flow components in order of execution and includes information about progress, displayed as percentage complete, and the number of rows processed.
To enable or disable the display of messages on the Progress tab, toggle the Debug Progress Reporting option on the SSIS menu. Disabling progress reporting can help improve performance while running a complex package in BI Development Studio.
Configure Logging in the Package
Integration Services includes a variety of log providers that allow packages to log information at run time to different types of files, or to SQL Server. You can enable log entries for packages and for individual package objects such as tasks and containers. Integration Services includes a wide variety of tasks and containers, and each task and container has its own set of descriptive log entries. For example, a package that includes an Execute SQL task can write a log entry that lists the SQL statement that the task executed, including parameter values for the statement.
The log entries include information such as the start and finish times of packages and package objects, making it possible to identify slow running tasks and containers. For more information, see Logging Package Execution, Implementing Logging in Packages, and Custom Messages for Logging.
Configure Logging for Data Flow Tasks
The Data Flow task provides many custom log entries that can be used to monitor and adjust performance. For example, you can monitor components that might cause memory leaks, or keep track of how long it takes to run a particular component. For a list of these custom log entries and sample logging output, see Data Flow Task.
Use the PipelineComponentTime Event
Perhaps the most useful custom log entry is the PipelineComponentTime event. This log entry reports the number of milliseconds that each component in the data flow spends on each of the five major processing steps. The following table describes these processing steps. Integration Services developers will recognize these steps as the principal methods of a PipelineComponent.
The component checks for valid property values and configuration settings.
The component performs one-time processing before it starts to process rows of data.
The component performs one-time processing after it has processed all rows of data.
The transformation or destination component processes the incoming rows of data that an upstream source or transformation has passed to it.
The source or transformation component fills the buffers of data to be passed to a downstream transformation or destination component.
When you enable the PipelineComponentTime event, Integration Services logs one message for each processing step performed by each component. The following log entries show a subset of the messages that the Integration Services CalculatedColumns package sample logs:
The component "Calculate LineItemTotalCost" (3522) spent 356 milliseconds in ProcessInput.
The component "Sum Quantity and LineItemTotalCost" (3619) spent 79 milliseconds in ProcessInput.
The component "Calculate Average Cost" (3662) spent 16 milliseconds in ProcessInput.
The component "Sort by ProductID" (3717) spent 125 milliseconds in ProcessInput.
The component "Load Data" (3773) spent 0 milliseconds in ProcessInput.
The component "Extract Data" (3869) spent 688 milliseconds in PrimeOutput filling buffers on output "OLE DB Source Output" (3879).
The component "Sum Quantity and LineItemTotalCost" (3619) spent 141 milliseconds in PrimeOutput filling buffers on output "Aggregate Output 1" (3621).
The component "Sort by ProductID" (3717) spent 16 milliseconds in PrimeOutput filling buffers on output "Sort Output" (3719).
These log entries show that the data flow task spent the most time on the following steps, shown here in descending order:
The OLE DB source that is named "Extract Data" spent 688 ms. loading data.
The Derived Column transformation that is named "Calculate LineItemTotalCost" spent 356 ms. performing calculations on incoming rows.
The Aggregate transformation that is named "Sum Quantity and LineItemTotalCost" spent a combined 220 ms—141 in PrimeOutput and 79 in ProcessInput—performing calculations and passing the data to the next transformation.
Monitor the Performance of the Data Flow Engine
Integration Services includes a set of performance counters for monitoring the performance of the data flow engine. For example, you can track the total amount of memory, in bytes, that all buffers use and check whether components are out of memory. A buffer is a block of memory that a component uses to store data. For more information, see Monitoring the Performance of the Data Flow Engine.
Technical article, SQL Server 2005 Integration Services: A Strategy for Performance, on technet.microsoft.com
Technical article, Integration Services: Performance Tuning Techniques, on technet.microsoft.com
Video, Tuning Your SSIS Package Data Flow in the Enterprise (SQL Server Video), on technet.microsoft.com
Video, Understanding SSIS Data Flow Buffers (SQL Server Video), on technet.microsoft.com
Technical article, Increasing Throughput of Pipelines by Splitting Synchronous Transformations into Multiple Tasks, on sqlcat.com
Video, Microsoft SQL Server Integration Services Performance Design Patterns, on channel9.msdn.com.
Presentation, How Microsoft IT Leverages SQL Server 2008 SSIS Dataflow Engine Enhancements, on sqlcat.com.
Technical article, The Data Loading Performance Guide, on msdn.microsoft.com.
Technical article, We Loaded 1TB in 30 Minutes with SSIS, and So Can You, on msdn.microsoft.com.
Technical article, Top 10 SQL Server Integration Services Best Practices, on sqlcat.com.
Technical article and sample, The “Balanced Data Distributor” for SSIS, on sqlcat.com.
Video, Balanced Data Distributor, on technet.microsoft.com.