SSIS Backpressure Mechanism

One of the mechanisms that SSIS data flow engine utilizes to achieve high performance is “back pressure”.

Let’s consider a simple package with a source and destination. What happens if the source is fast and destination is slow? Say source is huge local raw file and destination is a remote relational database. If we read the source data with full speed, we would need a lot of memory to store all the data we’ve read, but could not write yet. In many cases we’d simply run out of memory or had to swap data to hard drive.

To avoid this, SSIS limits the speed of sources by controlling the number of active buffers inside each execution tree. This way you can process fast sources without running out of memory.

If a source or an async component is too fast (compared to the the transformations or destination down the path), the source is suspended when its execution tree gets too many buffers (currently this is fixed at 5 buffers). If the source is slow (i.e. transforms and destinations can process data faster than sources generate it), the back-pressure mechanism does not get involved and sources can run with full speed.

Unfortunately, in SQL 2005 we did not have any diagnostics that would tell user what is the slow part of the flow - the source or the transforms/destination. In Katmai (SQL 2008) if the back-pressure kicks in during package execution, at the end of package execution we report the total time that the source had to wait because of this mechanism.

What would you do with this information? If you run the package and see that the time a particular source was suspended is zero or relatively low (compared to the total execution time), you know the source is the slowest part of the data flow, and you need to focus on optimizing the source. E.g. you may remove unused columns from the query, simplify SQL statements, create indexes, etc.

But if a source reported that it has been suspected most of the time (considerable part of the total package execution time), you know the source is fast enough and you need to concentrate on performance of transforms and destinations.

Another implication of back pressure is that the source database connection can be opened longer than it is needed to execute the query otherwise. If you see that a source has been suspended for several minites, and you don't want to hold the appropriate database connection for such a long time, consider staging source data in raw files as suggested by Jamie Thomson at