Best approach for a incremental loading when the source and dest tables are on two different servers

jennifer zen 341 Reputation points
2020-11-26T21:33:10.877+00:00

Hello Experts,

I need your thoughts on this please..

So I have a source table on Server A and the dest table on server B.
There are about more ~2 millions rows in source table. I will be using SSIS package.
I have been requested to load data weekly into the destination table. As this is too much data (for truncate and load) to do it weekly, preferred is to do it incrementally based on two key columns(say col1 and col2) and dateupdated

I know we have option to do the upsert into two ways

1)Merge Statement(T-SQL)
2)source-lookup-insert and update(staging table needed) (Very costly I Suppose ?)

But the issue is that source and destination tables are on two different servers. So the only option for me is option 2.
But was told it a very costly process because lookup..is that true?
So can anybody share if there is any another efficient process for incremental load involving two different servers?
Any suggestion/s is really appreciated.

Thanks

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
12,826 questions
SQL Server Integration Services
SQL Server Integration Services
A Microsoft platform for building enterprise-level data integration and data transformations solutions.
2,460 questions
Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,559 questions
{count} votes

Accepted answer
  1. Jeffrey Williams 1,891 Reputation points
    2020-11-27T18:22:26.273+00:00

    The first goal for this type of process is to reduce the impact on the source system as much as possible. Do the majority of the 'work' on the destination - since you can control when and how that data is published and made available to the users.

    If you have the ability to filter the data in the source to only those rows that have changed since the last time the process ran - then you can limit the amount of data being extracted from the source and easily perform an update/insert operation. That can be done either in SSIS using the SCD (slowly changing dimension) task - or through a staging table and a merge or upsert (update/insert) process.

    If you cannot determine the rows that have changed - then you must pull all rows from the table each time. You can then use the SCD task or a staging table and merge/upsert process. However - pulling all the data each week also allows for an easier process by truncating the destination table and performing a full load.

    Note: 2 million rows is not really a lot of data - and shouldn't take a long time to extract and load. I have setup packages that extracted 100's of millions of rows across multiple tables using truncate/load process to 'refresh' the data every day - and that process took less than 30 minutes total.

    If you want to track the changed data...that is, you want to be able to see what the data looked like last week before it was changed, you should look into temporal tables in the destination. Using temporal tables you can perform the merge/upsert and a history table will automatically be created for you - where the history table would contain the rows values prior to being updated.

    And finally - you need to determine what columns need to be checked for changes and what are the business keys. The SCD task walks you through that setup and builds the lookups for you - where it looks up the data in the destination based on the business keys and redirects the row to either an update task or an insert task.

    1 person found this answer helpful.

4 additional answers

Sort by: Most helpful
  1. Guoxiong 8,126 Reputation points
    2020-11-26T23:57:38.307+00:00

    You can use a SSIS package to do that. You can bring data based on the column dateupdated to the staging table and then use MERGE statement to move data to the destination table. The key columns can be used to decide which rows should be inserted and which rows should be updated.

    2 people found this answer helpful.
    0 comments No comments

  2. Erland Sommarskog 101.9K Reputation points MVP
    2020-11-26T22:22:06.81+00:00

    If I understand it correctly, you have a means to identify the changed rows in the source. The first step would be to bring these rows over to the target server to a temp table or a staging table. From there you could run a MERGE statement.

    I'm not sure that I know exactly what source-lookup-insert is, but that could be because while I play an SQL Server expert on TV, I am actually not acquainted with SSIS.

    0 comments No comments

  3. jennifer zen 341 Reputation points
    2020-11-26T23:14:40.127+00:00

    @Erland Sommarskog

    Thanks for you reply.
    So doesn't bringing the whole rows to a staging table on target server defeat the purpose of avoiding the truncate and load every time on the destination table?
    My requirement is that I want to transfer only the updated/new rows from the source server to the destination(I don't have linked server option though).
    I was thinking like is there a way like I can use EST(ExecuteSeqTask) for the full result set and then pass on that resultset to the next EST that do the merge(Tsql)statement...But I need to have a variable for every column(columns are around 230) and do the mapping..too much work...

    DFT-->OLE Source connection-(Select only changed/new rows)-do a Lookup-do Upsert(update or insert) into destination table is what i meant sorry!!


  4. Monalv-MSFT 5,896 Reputation points
    2020-11-27T02:55:39.567+00:00

    Hi @jennifer zen ,

    We can use Lookup Transformation and Conditional Split Transformation in ssis package.

    Please refer to Incremental Load in SSIS to update data in SQL Server Destination Table from Excel Source.

    Best Regards,
    Mona

    ----------

    If the answer is helpful, please click "Accept Answer" and upvote it.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.
    Hot issues in November--What can I do if my transaction log is full?
    Hot issues in November--How to convert Profiler trace into a SQL Server table?

    0 comments No comments