question

RichardsSamDGSTLHQ-9787 avatar image
0 Votes"
RichardsSamDGSTLHQ-9787 asked AnnuKumari-MSFT commented

Azure Synapse Pipeline - Copy Data to CosmosDB w Upsert - How to check if items missing in new extract are in CosmosDB and then delete them?

I have an Azure Synapse pipeline that moves data from a .csv into a .parquet file which is then copied into two location, Azure SQL table and Azure CosmosDB. When moving into the Azure SQL table, I can delete existing records and the insert the new dataset. The CosmosDB portion uses upsert to update existing ones and insert new records. However, I am at a loss as how to check if any records in the CosmosDB are not in the new .parquet file and then delete them if they do not exist.

Is this better achieved on the CosmosDB side with a stored procedure or is there a method of using dynamic content on the write behavior of the copy data activity in Azure Synapse?201824-2022-05-13-08-06-22-bwdgsynapsedev01-azure-synapse.png


azure-synapse-analyticsazure-cosmos-db
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

AnnuKumari-MSFT avatar image
0 Votes"
AnnuKumari-MSFT answered AnnuKumari-MSFT commented

Hi @RichardsSamDGSTLHQ-9787 ,
Welcome to Microsoft Q&A platform and thankyou for posting your query.
As I understand your question, it seems you are trying to figure out how to delete missing source rows from your target database . Please correct me if my understanding is incorrect. I don't have cosmosDB provisioned to reproduce your exact scenario. However , you can refer to the below solution and try to implement.

You can utilize Mapping dataflow inside Synapse workspace to achieve this requirement:
1. Add two sources: One for the sourceFile and other for Target DB(CosmosDB in your case)
2. Add Exist transformation and select Exist Type as 'Doesn't exist' and specify the PrimaryKey Column

202085-image.png
3. Add Alter Row Transformation and add condition as 'Delete if'='true()'

201980-image.png
4. Add the Sink as CosmosDB dataset.

Please refer to the following video for more details: https://www.youtube.com/watch?v=9i7qf1vczUw

Alternatively, you can try to load the entire source data into a temporary table / dummy table and perform delete from target DB based on data not present in the temp table.

Hope this will help. Please let us know if any further queries.


  • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you.
    Original posters help the community find answers faster by identifying the correct answer. Here is how

  • Want a reminder to come back and check responses? Here is how to subscribe to a notification

  • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators



image.png (64.0 KiB)
image.png (62.1 KiB)
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@AnnuKumari-MSFT thanks for doing this it meets my needs!

1 Vote 1 ·
AnnuKumari-MSFT avatar image AnnuKumari-MSFT RichardsSamDGSTLHQ-9787 ·

@RichardsSamDGSTLHQ-9787 ,
Glad to know it helped. Thanks !

0 Votes 0 ·