Merge Join Transformation
The Merge Join transformation provides an output that is generated by joining two sorted datasets using a FULL, LEFT, or INNER join. For example, you can use a LEFT join to join a table that includes product information with a table that lists the country/region in which a product was manufactured. The result is a table that lists all products and their country/region of origin. For more information, see Using Joins.
You can configure the Merge Join transformation in the following ways:
Specify the join is a FULL, LEFT, or INNER join.
Specify the columns the join uses.
Specify whether the transformation handles null values as equal to other nulls.
If null values are not treated as equal values, the transformation handles null values like the SQL Server Database Engine does.
The Merge Join transformation requires that both inputs be sorted and that the joined columns have matching metadata. For example, you cannot join a column that has a numeric data type with a column that has a character data type. If the data has a string data type, the length of the column in the second input must be less than or equal to the length of the column in the first input with which it is merged.
The data can be sorted by setting sort options on the outputs of the source or other upstream data flow components, or by inserting a Sort transformation into the data flow before the Merge Join transformation. If the sort options indicate that the data is sorted, but the data is not actually sorted, the merge join operation may have unpredictable results.
This transformation has two inputs and one output. It does not support an error output.
The Merge Join transformation includes the MaxBuffersPerInput property, which specifies the maximum number of buffers that can be active for each input at one time. You can use this property to tune the amount of memory that the transformation buffers use and consequently the performance of the transformation. The larger the number of buffers, the more memory the transformation uses and the better the performance is. The default value of MaxBuffersPerInput is 5, which is the number of buffers that works well in most scenarios. To tune performance, you may want to try using a slightly different number of buffers such as 4 or 6. If possible, you should avoid using a very small number of buffers. For example, setting MaxBuffersPerInput to 1 instead of 5 has a significant impact on performance. Also, you should not set MaxBuffersPerInput to 0. This value means that no throttling occurs, and depending on the data load and the amount of memory available the package may not complete.
To avoid deadlock, the Merge Join transformation may temporarily increase the number of buffers it uses beyond the value of MaxBuffersPerInput. After the deadlock condition is resolved, MaxBuffersPerInput returns to its configured value.
Configuring the Merge Join Transformation
You can set properties through the SSIS Designer or programmatically.
For more information about the properties that you can set in the Merge Join Transformation Editor dialog box, see Merge Join Transformation Editor.
For more information about properties that you can set programmatically, click one of the following topics:
For more information about how to set properties, click one of the following topics:
- How to: Extend a Dataset Using the Merge Join Transformation
- How to: Set the Properties of a Data Flow Component Using a Component Editor
- How to: Set the Properties of a Data Flow Component in the Properties Window
- How to: Set Sort Attributes on an Output
Help and Information
5 December 2005