Synchronize two tables using SQL Server Integration Services (SSIS)-Part II of II
Please, refer to this link for the first part of this post.
In this part we will start with the same sample scenario where “table A” in “database A” needs to be synchronized with “table B” in data warehouse “database B”. During the first part of this post we discussed how to deal with new records in “table A” but we did not explain how to proceed when existing records are updated in “table A”.
We can implement different methods within SSIS to discover the records that differ between source and destination tables. For example, we can use the EXCEPT operator to extract those differences and update the destination table accordingly however, I did find that using tablediff utility is probably the fastest way to implement this in SSIS. tablediff is used in SQL Server replication to compare and return detailed information about what difference exist between two tables.
The great thing about tablediff is that it can not only compare tables, but also script out the differences between then too, this way tables can be synchronized just by running the script. The caveat about using tablediff is that this utility only works with SQL Server servers so if you are trying to synchronize tables that are hosted in a different database engine you are out of luck with this post.
In my case this is what I did to synchronize the two tables:
1. Ensure tablediff utility is installed on you SQL Server. The tablediff.exe executable is found under C:\Program Files\Microsoft SQL Server\<version>\COM
2. Add the tablediff.exe path to the Windows system path using Computer Properties > Advanced System Settings > Advanced > Environment Variables > System Variables > PATH
3. Make sure the xp_cmdshell advanced SQL Server option is enabled running sp_configure from SQL Server Management Stduio (SSMS):
Please, carefully read the xp_cmdshell documentation and understand the implication of using this option in your environment. xp_cmdshell creates a Windows process with the same security rights as the SQL Server service account which means that sysadmins members can access OS functions than his Windows account may not have. By default, only members of the SQL Server sysadmin fixed server role are authorized to run this extended procedure. If your company has no guidelines about what permissions are used for the SQL Server service accounts and who belongs to the sysadmin fixed server role, carefully evaluate the xp_cmdshell option.
4. Using the same SSIS project we created during the first part of this post, create two “Execute SQL Task” objects under the Control Flow section in BIDS. The first task, called “Execute tablediff” in my example, will take care of executing the tablediff.exe command. Here is a sample of the code in my case:
exec master..xp_cmdshell 'tablediff.exe -sourceserver SQL2008R2\KILIMANJARO64 -sourcedatabase SSISDBSource -sourcetable Customer -destinationserver SQL2008R2\KILIMANJARO64 -destinationdatabase SSISDBDest -destinationtable Customer -f C:\Temp\Diff'
The important part here is the –f switch, that creates a T-SQL script with the changes that have to be implemented on the destination table to have the same information as the source, here is an example of this automatically generated script:
The second task, called “Execute SQL Script” in my example, will take care of running the saved C:\Temp\Diff.sql script in the database, implementing the changes from the source table on the destination table:
5. Optionally you can combine the “Data Flow” task we created during the first part of this blog with these two “Execute SQL Task” to have a full synchronization package.