question

Swares avatar image
0 Votes"
Swares asked AnnuKumari-MSFT commented

Copy massive data from between 2 SQL Instances

Hi guys,

I'm trying to figure out the best approach due to an architectural constraint. I have a public organization that demands that the final DWH layer is segregated from the ODS / Staging Area, on 2 diferent instances.

While I believe this is an horrible approach and that the Staging and final Data Warehouse cannot be black boxes and they must communicate between them for proper and efficient data flow, I'm stuck with this option for now. Since they don't even allow linked servers between both instances the only communication between them can only be made through SSIS.

Since I use the SQL Merge command for Dimension and Fact Loading, since the tables have a noticeable size, even in incremental loads, since records can be updated, I don't have any optimal way to run ETL.

My option was to create a DWH schema under the ODS database, that is copied to the final EDW SQL instance. This however needs to synced everyday by truncating all the tables and copy millions of rows every single day.

I believe this is a stupid, poorly architecture design, but anyway, maybe some brainstorm will help ease the process.

azure-synapse-analytics
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Can you load only deltas? by using a load_date on stage and use it to load only new rows but be careful and check how deletes are handled or if you can use change data capture.
log shipping might be something to look at as well

0 Votes 0 ·

Hi Lulzim,

Yes, but if I cannot connect between both instances, then I don't see how could I create a performant delta load. This will always involve inserting or updating existing records in the fact tables.

Maybe I'm not seeing any obvious good scenario.

0 Votes 0 ·

1 Answer

LulzimBilali-4618 avatar image
0 Votes"
LulzimBilali-4618 answered AnnuKumari-MSFT commented

As I understand from your question today you are loading the DWH on the stage instance and than coping it to the DWH instance.

couple of options I was providing where:

  1. based on a load_date identify what is new/updated/deleted and move only the rows that rows instead of everything.

  2. you can identify what changed by using CDC instead of using a load_date

  3. you can use log shipping https://docs.microsoft.com/en-us/sql/database-engine/log-shipping/about-log-shipping-sql-server?view=sql-server-ver15

  4. or another alternative: you can create DWH as a database on stage instead of a schema and do a backup / restore

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi Lulzim,

I did another thing, altough I believe that by taking a bath on the mud you will never get clean :D

I did a copy of the Staging table that has all the modeled data before it loads to the fact table, to an AUX schema of the EDW instance. I then replicated the Load process trough a SQL MERGE between T table and fact table on both instances. It's redundant, dirty and unnecessary, although it's probably the fastest method. CDC might be interesting in the future, although I believe the architectural fundamental must be changed.

Thank you for your outputs.

0 Votes 0 ·

Hi @Swares ,
Just checking if the above suggestion by @LulzimBilali-4618 was helpful, kindly do click Accept Answer as accepted answer helps community as well.

0 Votes 0 ·