Best approach for a incremental loading when the source and dest tables are on two different servers

Question

Hello Experts,

I need your thoughts on this please..

So I have a source table on Server A and the dest table on server B.
There are about more ~2 millions rows in source table. I will be using SSIS package.
I have been requested to load data weekly into the destination table. As this is too much data (for truncate and load) to do it weekly, preferred is to do it incrementally based on two key columns(say col1 and col2) and dateupdated

I know we have option to do the upsert into two ways

1)Merge Statement(T-SQL)
2)source-lookup-insert and update(staging table needed) (Very costly I Suppose ?)

But the issue is that source and destination tables are on two different servers. So the only option for me is option 2.
But was told it a very costly process because lookup..is that true?
So can anybody share if there is any another efficient process for incremental load involving two different servers?
Any suggestion/s is really appreciated.

Thanks

Accepted Answer

The first goal for this type of process is to reduce the impact on the source system as much as possible. Do the majority of the 'work' on the destination - since you can control when and how that data is published and made available to the users.

If you have the ability to filter the data in the source to only those rows that have changed since the last time the process ran - then you can limit the amount of data being extracted from the source and easily perform an update/insert operation. That can be done either in SSIS using the SCD (slowly changing dimension) task - or through a staging table and a merge or upsert (update/insert) process.

If you cannot determine the rows that have changed - then you must pull all rows from the table each time. You can then use the SCD task or a staging table and merge/upsert process. However - pulling all the data each week also allows for an easier process by truncating the destination table and performing a full load.

Note: 2 million rows is not really a lot of data - and shouldn't take a long time to extract and load. I have setup packages that extracted 100's of millions of rows across multiple tables using truncate/load process to 'refresh' the data every day - and that process took less than 30 minutes total.

If you want to track the changed data...that is, you want to be able to see what the data looked like last week before it was changed, you should look into temporal tables in the destination. Using temporal tables you can perform the merge/upsert and a history table will automatically be created for you - where the history table would contain the rows values prior to being updated.

And finally - you need to determine what columns need to be checked for changes and what are the business keys. The SCD task walks you through that setup and builds the lookups for you - where it looks up the data in the destination based on the business keys and redirects the row to either an update task or an insert task.

Answer

You can use a SSIS package to do that. You can bring data based on the column dateupdated to the staging table and then use MERGE statement to move data to the destination table. The key columns can be used to decide which rows should be inserted and which rows should be updated.

Answer

If I understand it correctly, you have a means to identify the changed rows in the source. The first step would be to bring these rows over to the target server to a temp table or a staging table. From there you could run a MERGE statement.

I'm not sure that I know exactly what source-lookup-insert is, but that could be because while I play an SQL Server expert on TV, I am actually not acquainted with SSIS.

Answer

@Erland Sommarskog

Thanks for you reply.
So doesn't bringing the whole rows to a staging table on target server defeat the purpose of avoiding the truncate and load every time on the destination table?
My requirement is that I want to transfer only the updated/new rows from the source server to the destination(I don't have linked server option though).
I was thinking like is there a way like I can use EST(ExecuteSeqTask) for the full result set and then pass on that resultset to the next EST that do the merge(Tsql)statement...But I need to have a variable for every column(columns are around 230) and do the mapping..too much work...

DFT-->OLE Source connection-(Select only changed/new rows)-do a Lookup-do Upsert(update or insert) into destination table is what i meant sorry!!

Answer

Hi @jennifer zen ,

We can use Lookup Transformation and Conditional Split Transformation in ssis package.

Please refer to Incremental Load in SSIS to update data in SQL Server Destination Table from Excel Source.

Best Regards,
Mona

----------

If the answer is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.
Hot issues in November--What can I do if my transaction log is full?
Hot issues in November--How to convert Profiler trace into a SQL Server table?

Best approach for a incremental loading when the source and dest tables are on two different servers

4 additional answers