question

arkiboys avatar image
0 Votes"
arkiboys asked arkiboys edited

upsert, soft delete the sink Delta parquet

Hello,
How is it possible to have the sink Delta parquet file to upsert (update/insert) as well as a soft delete.
For example,
1- source parquet file has 10 rows --> sink Delta parquet gets 10 rows --> this is insert
2- source parquet file had 10 rows initially, but now it has 8 rows --> sink Delta parquet has 10 rows still but the Deleted column to be set to 1 for the two deleted rows --> this is insert/update

by soft delete I mean the Deleted column is set to 1 for the deleted rows whereas all the other rows have Deleted set to 1

Thank you

azure-data-factory
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

I think in point number 2, it is only update and I guess there is no delete in this scenario.

0 Votes 0 ·

1 Answer

HimanshuSinha-MSFT avatar image
0 Votes"
HimanshuSinha-MSFT answered arkiboys edited

Hello @arkiboys ,
Thanks for the ask and using Microsoft Q&A platform .
Looking at the threads which you have created , it looks like your source is paraquet and sink is delta . I did implemented the something similar here .

SQL - > Paraquet - Delta .
I am using the EXIST transform here whcih I had issues before , I think I know the issue now .

Insert

You select the exists or not "doesn't exist " will depend on the left and right stream , so please make a note of this . What ever we do in INSERT , for DELETEION we will do the opposite .

138929-image.png

The ALTER ROW should have the below dynamic expression

iif(Deleted==0, true(), false())

138954-image.png

Deletion

138920-image.png

139011-image.png


Please do let me know how it goes .
Thanks
Himanshu


  • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how

  • Want a reminder to come back and check responses? Here is how to subscribe to a notification

  • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators



image.png (128.6 KiB)
image.png (25.4 KiB)
image.png (133.6 KiB)
image.png (17.2 KiB)
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello,
I think I am getting close to the goal...

In Alter Row2, I can see the deleted row which is the row with IsDeleted = 1
Upser iIf --> iif(IsDeleted == 1, true(), false())

In Alter Row1, I can see the new row which is the row with IsDeleted = 0
Upsert if --> iif(IsDeleted == 0, true(), false())

sin1 and sink2 have the setting --> Allow Upsert --> ticked

I see data in data preview of both sink1 and sink2, however the error seems to be from sink2 when I run the pipeline
Sink2 has one more column. Not sure why. It is JoinFieldTarget. Not sure if this is related to the error.
"
{"message":"Job failed due to reason: at Sink 'sink2': org.apache.spark.sql.AnalysisException: cannot resolve target.JoinFieldTarget in UPDATE clause given columns target.`DepartmentID`, target.`Name`, target.`GroupName`, target.`ModifiedDate`, target.`CurrentDate`, target.`InputFilePath`, target.`JoinFieldSource`, target.`IsDeleted`;. Details:org.apache.spark.sql.AnalysisException: cannot resolve target.JoinFieldTarget in UPDATE clause given columns target.`DepartmentID`, target.`Name`, target.`GroupName`, target.`ModifiedDate`, target.`CurrentDate`, target.`InputFilePath`, target.`JoinFieldSource`, target.`IsDeleted`;\n\tat org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)\n\tat

![![139203-image.png][2]][1]


[1]: /answers/storage/attachments/139159-image.png

0 Votes 0 ·
image.png (68.8 KiB)