question

jenniferzen-7686 avatar image
0 Votes"
jenniferzen-7686 asked jenniferzen-7686 commented

Best approach for a incremental loading when the source and dest tables are on two different servers

Hello Experts,

I need your thoughts on this please..

So I have a source table on Server A and the dest table on server B.
There are about more ~2 millions rows in source table. I will be using SSIS package.
I have been requested to load data weekly into the destination table. As this is too much data (for truncate and load) to do it weekly, preferred is to do it incrementally based on two key columns(say col1 and col2) and dateupdated

I know we have option to do the upsert into two ways

1)Merge Statement(T-SQL)
2)source-lookup-insert and update(staging table needed) (Very costly I Suppose ?)

But the issue is that source and destination tables are on two different servers. So the only option for me is option 2.
But was told it a very costly process because lookup..is that true?
So can anybody share if there is any another efficient process for incremental load involving two different servers?
Any suggestion/s is really appreciated.

Thanks


sql-server-generalsql-server-transact-sqlsql-server-integration-services
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Incremental load means no UPDATE, correct?

0 Votes 0 ·

There is update-->Update and Insert to be precise

0 Votes 0 ·
JeffreyWilliams-3310 avatar image
1 Vote"
JeffreyWilliams-3310 answered jenniferzen-7686 commented

The first goal for this type of process is to reduce the impact on the source system as much as possible. Do the majority of the 'work' on the destination - since you can control when and how that data is published and made available to the users.

If you have the ability to filter the data in the source to only those rows that have changed since the last time the process ran - then you can limit the amount of data being extracted from the source and easily perform an update/insert operation. That can be done either in SSIS using the SCD (slowly changing dimension) task - or through a staging table and a merge or upsert (update/insert) process.

If you cannot determine the rows that have changed - then you must pull all rows from the table each time. You can then use the SCD task or a staging table and merge/upsert process. However - pulling all the data each week also allows for an easier process by truncating the destination table and performing a full load.

Note: 2 million rows is not really a lot of data - and shouldn't take a long time to extract and load. I have setup packages that extracted 100's of millions of rows across multiple tables using truncate/load process to 'refresh' the data every day - and that process took less than 30 minutes total.

If you want to track the changed data...that is, you want to be able to see what the data looked like last week before it was changed, you should look into temporal tables in the destination. Using temporal tables you can perform the merge/upsert and a history table will automatically be created for you - where the history table would contain the rows values prior to being updated.

And finally - you need to determine what columns need to be checked for changes and what are the business keys. The SCD task walks you through that setup and builds the lookups for you - where it looks up the data in the destination based on the business keys and redirects the row to either an update task or an insert task.

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @JeffreyWilliams-3310,

Thank you for explaining that so neatly.
Yes, I can identify the changed records from the changeID(fields).That's what I was thinking too.I could limit the reocords by taking the latest ChangeID and do an insert/update using lookup(on business keys) in SSIS.
Sure,I will check out this temporal tables.

Thanks Again :)


0 Votes 0 ·
ErlandSommarskog avatar image
0 Votes"
ErlandSommarskog answered

If I understand it correctly, you have a means to identify the changed rows in the source. The first step would be to bring these rows over to the target server to a temp table or a staging table. From there you could run a MERGE statement.

I'm not sure that I know exactly what source-lookup-insert is, but that could be because while I play an SQL Server expert on TV, I am actually not acquainted with SSIS.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

jenniferzen-7686 avatar image
0 Votes"
jenniferzen-7686 answered jenniferzen-7686 commented

@ErlandSommarskog

Thanks for you reply.
So doesn't bringing the whole rows to a staging table on target server defeat the purpose of avoiding the truncate and load every time on the destination table?
My requirement is that I want to transfer only the updated/new rows from the source server to the destination(I don't have linked server option though).
I was thinking like is there a way like I can use EST(ExecuteSeqTask) for the full result set and then pass on that resultset to the next EST that do the merge(Tsql)statement...But I need to have a variable for every column(columns are around 230) and do the mapping..too much work...

DFT-->OLE Source connection-(Select only changed/new rows)-do a Lookup-do Upsert(update or insert) into destination table is what i meant sorry!!

· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

My requirement is that I want to transfer only the updated/new rows from the source server to the destination(I don't have linked server option though).

Yes, that I was exactly what I had in mind. You would run a query on the source server to get these rows into the target server. This could be done directly through a linked server (if you have one, but you don't), or it could be through some client program of which I guess SSIS is one option.

I don't why you need a variable for every columns, but as I said, I don't know SSIS, so that could explain my lack of understanding. Then again, setting up 230 columns for a staging table is not very fun.

...to be continued.

0 Votes 0 ·

I happen have something you maybe could have use for. Attached is a C# program that runs a query and receives the result set, and then creates a table and a table type and writes back the data. The program creates the table on the same connection was the query was executed on, but that could be changed. It is not perfect, because it works from a DataSet, and the DataSet does not have the original type information.

This may not fit you at all, but I figured that since I have it around, I could share it.43304-execandinsert.txt


0 Votes 0 ·
execandinsert.txt (5.8 KiB)

Hi @ErlandSommarskog,

I appreciate sharing the code with me. I will have a look and see if it can be used for any of my dataset.

Thanks :)

0 Votes 0 ·
GuoxiongYuan-7218 avatar image
2 Votes"
GuoxiongYuan-7218 answered GuoxiongYuan-7218 edited

You can use a SSIS package to do that. You can bring data based on the column dateupdated to the staging table and then use MERGE statement to move data to the destination table. The key columns can be used to decide which rows should be inserted and which rows should be updated.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Monalv-msft avatar image
0 Votes"
Monalv-msft answered

Hi @jenniferzen-7686 ,

We can use Lookup Transformation and Conditional Split Transformation in ssis package.

Please refer to Incremental Load in SSIS to update data in SQL Server Destination Table from Excel Source.

Best Regards,
Mona


If the answer is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.
Hot issues in November--What can I do if my transaction log is full?
Hot issues in November--How to convert Profiler trace into a SQL Server table?



5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.