Tuning Your SSIS Package Data Flow in the Enterprise (SQL Server Video)

Video Summary

This video demonstrates how to improve the performance of the data flow in an Integration Services package. In this video, you will learn how to tune the following phases of the data flow:

  • Extraction

  • Transformation

  • Loading

You can apply these performance tuning tips when you design, develop, and run the data flow.

Video Acknowledgements

Thank you to Thomas Kejser for contributing to the material for the series, SSIS: Designing and Tuning for Performance SQL Server Video Series. This video is the second in the series.

Thank you to Carla Sabotta and Douglas Laudenschlager for their guidance and valuable feedback.

Video Transcript

Video Time Stamp



Hi, this is David Noor, Senior Development Lead on SQL Server Integration Services here at Microsoft. Welcome to this video on tuning your SSIS package data flow in the enterprise.

This video is the second part of a four-part series titled, SSIS: Designing and Tuning for Performance. In part one of the series, Denny took a look at how to best measure and understand the performance of SSIS packages. In this installment, we’ll build on that, and look at how to improve the performance of your SSIS package data flow. We’ll start by identifying the common themes in every data flow, and how to choose which parts of your data flow you should work on first when trying to improve performance. Once you understand where performance problems are, there’s a lot you can do in your data flow to make things run faster and more efficiently. We’ll look at a series of specific tips you can apply when designing, developing, and even executing your dataflow.

Let’s get started!

Most package data flows focus on extracting, transforming, and loading critical data, sometimes known as ETL. Tuning your data flows means tuning each of these phases of data flow: extraction, transformation, and loading.

For extraction, we’ll talk about how to tune SSIS source components and the queries they run to give you the best performance and minimum impact on your source systems.

Tuning the transformations in your data flow means looking at the work you need to accomplish, and choosing the optimal approach to accomplish that work by removing unnecessary steps, and sometimes changing the original source queries to do transformations for you.

Lastly, tuning the loading side of your data flow means tuning SSIS destination components, networks, locking, and other factors which can impede your data flow’s progress as it attempts to load data into your destination.

In this video, we’ll talk about practical tips you can use to look for tuning opportunities in your data flow in all three phases.

Before we begin looking at these tips, it’s important to remember that there’s usually no silver bullet when dealing with performance problems. No one tip here is likely to fix all of the performance issues your data flow may be having. The tips we’ll show work well in common cases, but you’re not going to want to just throw a lot of tips at your problem. Your best bet is always to understand what the intent of your data flows are, make sure they’re designed well to meet those goals, measure their current performance, and iteratively make changes to your data flow and measure whether or not your change has improved performance.

Also when you’re looking at a performance problem, it’s easy to become fixated on a particular technology or component, sometimes because of familiarity with the technology or the code. But whatever the reason, try to think of your data flow in the context of an entire ecosystem—most likely, multiple relational databases, a network, operating systems, file systems, lots of pieces. The more you know about this whole ecosystem, the more complete of a picture you’ll have as to the real performance challenges any one piece is facing, and you’ll be able to make more effective, less risky changes.


We’ll start where your data flow does—with extraction. When using SQL Server or any other database with a bulk interface as your source, you should experiment with increasing the packet size. Normally, the SQL Server default of 4096 works well, but since your extraction is going to be moving large amounts of data, bumping this up should help. For this to take effect, you’ll want to ask your network admin to enable “Jumbo Frames” on your network as well. You’ll want to test the effect of this on your package, though. If you’re using the same connection manager for bulk operations like an OLE DB source and for single-row operations (like an OLE DB command), you might consider creating a second connection manager for the OLE DB command operations, and use a lower packet size there.

As noted, if your ETL is running on Windows 2008 and you’re on a multi-core machine with multiple NICs, you can achieve a bit better network performance by affinitizing network cards to individual cores. Look for a blog post entitled, Scaling Heavy Network Traffic with Windows, on the MSDN site for more information.

One of the easiest things to look for in tuning extraction is to tune the queries you’re using. It’s tempting to sometimes just select the table you want to extract from and allow all of the columns to be retrieved, but you’ll get far better results if you select only the columns you really need. Not only will this result in less network traffic and memory used, but the database server may be able to do much less I/O to satisfy your query.

As shown here, for SQL Server, you can also use a hint to indicate that no shared locks should be issued while doing the select, so your query will read potentially uncommitted or dirty data. Reserve this for cases when you absolutely must have the best performance, and you’re sure that reading dirty data will always be okay for your ETL job.

Lookups can be thought of as extraction or transformation. But either way, you should use some of the same ideas as we showed on the previous slide. Here, it’s even more critical to select just the columns you need, not only for network optimization, but for optimization of your in-memory cache as well.

If you’re using SQL Server 2008, one of the best features for performance is the addition of the new, shared lookup cache. Sharing a lookup cache allows you to fetch your reference data once, and then re-use it across multiple lookup operations in your package, or even across packages by saving the cache to a file. If you have multiple lookup components that reference the same table, you’ll want to check out this new option as a means to greatly increase the speed of your packages. Using it can be pretty simple. Create a Cache connection manager in your package, populate the cache using the new cache transform, and then change your lookups to refer to this cache connection for their reference data.


Let’s move on to the second phase of ETL: transformation. In transformation, you’re working with data already loaded into the data flow and attempting to get it into the right shape before it’s loaded. In SSIS, every transformation appears as an object you can drag onto your design, but not all transformations are made equal. Let’s look at the three kinds of transformations that exist in SSIS:

  • Synchronous, or row-based transforms, operate on buffers of data in-place. They don’t make copies of data buffers or rows as they flow through. They just transform the data in the buffer directly. That makes synchronous transforms fairly fast. Examples of synchronous transforms include data conversion, derived column, and lookup.

  • Partially-blocking, asynchronous transforms are different. As data comes into these transforms, the transform needs to hold on to the data in order to do its work. To do that, it will copy incoming data into internal buffers and consume memory, sometimes lots of it. As data continues to flow in, these transforms may realize that they can go ahead and write output data. They’ll do that and free up the corresponding internal memory they’ve been holding on to. Once the data flow is complete, they’ll free up all the memory they’ve used. But until then, they will typically consume a noticeable amount of your data.

    Examples of these kinds of transforms include Merge, MergeJoin, and Union All. When you have these transforms in your data flow, look for ways to optimize. Are there redundant copies of these? Are there MergeJoins or Unions you could “push up” to your source system? I’ve seen packages in the past where instead of doing a JOIN in a source query, they’ve fetched all the data from two tables using OLE DB Sources, and then used a Merge Join in the package to do the join, even though they could have just written a SQL join at the source, and it would have been a lot faster because of the database’s ability to optimize that query. Look for these opportunities to consolidate and reduce the number of asynchronous transforms.

    In SQL Server 2008, we’ve done a lot in the data flow task scheduler to make complex data flows more performant and make better use of available CPUs. If you started using SSIS in SQL Server 2005, you may have introduced a Union All transform into your data flow to artificially split up and parallelize execution trees. You shouldn’t need to do this any longer. Because of the improvements we’ve made in SQL Server 2008, that’s no longer necessary or recommended. Remove these artificial Union All components, and you should experience better performance.

  • The third group, blocking asynchronous transforms, are like extreme versions of the previous group – they have to hold on to ALL of the input data before they can write any output data. Using these transforms in data flows with large batches of data can often slow down your data flow, significantly. If you use these in large data flows, make sure you’re not using these redundantly. If you have two aggregates or two sorts on the same data in the same flow, rework your package to try to get down to one.

With the basics covered, let’s look at a few other specific tips:

  • It’s pretty common for data flows to need to transform a column’s data type. Try to do this just once, and use as narrow of types as possible to keep data buffers compact. Casting data takes CPU time as well, and if your entire data flow can use one type for a given column, consider casting that column at your data source using SQL’s CAST or CONVERT functions, or the equivalent in your databases’ dialect of SQL.

  • I mentioned this a couple of slides ago, but you’ll want to optimize transforms by thinking smart about where they need to go, and don’t be afraid of using SQL at your sources to your advantage. For example, if you’re doing sorts on your source data, consider pushing those up to ORDER BY clauses in the SQL on your source. Your database may be able to return sorted data to you much more efficiently than the data flow. Save the sort component here for cases where you need to sort data that’s been merged together from multiple sources. Likewise, some aggregations can be done faster at the source using a GROUP BY and SQL aggregate function.

  • If you’re using SQL Server 2008 and you’re using the Slowly Changing Dimension (SCD) component, take a look at the new MERGE functionality in SQL Server. MERGE can accomplish much of the same work that SCD can, but without nearly the same number of network roundtrips.

  • Don’t forget also the utility of SQL’s INSERT INTO functionality. If your data flow is simple and both the source and destination are on the same database instance, you might be able to do the same data movement much, much faster through a single SQL statement, causing the data movement to occur entirely on the database. In these cases, INSERT INTO can run an order of magnitude faster than a data flow, because the data never needs to leave the server.

  • Last, but not least, if you’re doing incremental loads, take a look at the alternative of simply reloading. Some systems I’ve seen spend a lot of time doing delta detection to avoid reloading data, but the I/O and CPU used to accomplish this ends up making the job ultimately slower.


Let’s move on to the loading phase.

When loading into SQL Server, you have two performant options:

  • The first option is a SQL Server Destination component. This component uses shared memory between the data flow and the database engine to quickly load data, but only works if your data flow always runs on the same machine as SQL Server itself. Also, SQL Server destination has some documented limitations with regard to error handling.

  • Your other option for quick loading into SQL Server is the OLE DB destination, which often runs nearly as fast as the SQL destination.

For all of these, using a commit size of 0 will result in the fastest load.

It’s a pretty common practice to drop indexes in your destination system as well when doing large loads, but you should use some guidelines to inform you as to when to do it. A common guideline is to choose to drop indexes based on the percent increase the load is expected to cause, and the kinds of indexes you have on the table:

  • If you have a single, clustered index on your table, don’t drop it. Your table data is ordered by this key, and the time it will take to drop, insert, and rebuild will virtually never be less than just doing the load with the clustered index in place.

  • If you have a single, nonclustered index on your table, consider dropping it if your load represents around a hundred percent increase in data size. This is a rule of thumb, not an exact value, but roughly speaking it probably won’t be worth dropping and rebuilding the index if you’re not going to double the size of the table.

  • If you have multiple indexes on your table, it’s quite a bit harder to create a rule of thumb. I tend to think in the 10% increase range. For example, loads smaller than 10% of the current volume could probably leave indexes in place. But your best bet here is to experiment and measure.

If you’re loading into a table that has other activity going on it, you’ll need to devise a strategy. Bulk loads will likely lock down most if not all of the destination table. So, you’ll want to make sure that’s okay, or use partitioning. If you need to load against an operational database, you may be able to set up a partition to load into so that current operational data can stay active. If your load is going slowly, you’ll want to make sure to check out what other activity is occurring against that table or partition, and make sure nothing else is contending with it.

For an excellent guide on increasing performance of bulk loads and partitions, look for the SQLCAT article entitled, The Data Loading Performance Guide, available on MSDN.

Also when doing reloads, make sure to use TRUNCATE, not DELETE to clear out data, so that the delete won’t be transactional.

Destinations that use a network connection are also subject to the same network issues I described earlier. Look at increasing the packet size and enabling "Jumbo Frames" on your network to get optimal performance at load time.


Well, that’s it for part two of this performance series. Special thanks to Thomas for all of his helpful information that formed the base for this video series, and to Carla and Douglas for all of their help in creating this series. For more information on these topics, please refer to the Top 10 SQL Server Integration Services Best Practices.

Thank you for watching this video. Be sure to check out the other three parts of this video series titled, SSIS: Designing and Tuning for Performance. We hope that you have found this of value and we’d really like to hear what you think. On the video home page, look for the link to rate and give feedback in the upper-right hand corner.

See Also

Other Resources