question

arkiboys avatar image
0 Votes"
arkiboys asked ShaikMaheer-MSFT commented

data flow - sink parquet file not showing updatedtime for the row

Hello,
I change the column value of a row in the source which is sql server table.
data flow has :
1-source,
2-DerivedColumn which has expression for LoadDate = toUTC(currentTimestamp(), 'GMT Standard Time')
3-upsert
4-sink --> Delta dataset for the parquet file

Run the pipeline
Once the load is complete, I check the LoadDate column in the sink parquet file.
I see this field is updated for every row whereas I was expecting only for the row that I updated at source.
Any thoughts?
Thank you

azure-data-factory
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

ShaikMaheer-MSFT avatar image
0 Votes"
ShaikMaheer-MSFT answered ShaikMaheer-MSFT commented

Hi @arkiboys ,

Thank you for posting query on Microsoft Q&A Platform.

You mentioned step3, as upsert. Are you applying Upsert If condition using Alter row transformation?

Please consider below points to achieve your goal.

  • You need to use Alter row transformation and inside it, you should write a condition for Upsert If. All matching rows in your data with the condition will be considered for Upsert Operation. So make sure inside data preview of your Alter row, is your intended row marked as Upsert If or not.

  • Inside Sink transformation, you should select Allow upsert option. To make sure your upsert happenings on Sink.

Hope this helps. Please let us know how it goes.


  • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how

  • Want a reminder to come back and check responses? Here is how to subscribe to a notification

  • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators


· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi,
I still get updated values for all the rows. I am expecting the LoadDate field to be updated only for the upsert(updated/inserted) rows
See my settings below

Thank you


137667-image.png


137640-image.png
137668-image.png


0 Votes 0 ·
image.png (20.3 KiB)
image.png (18.0 KiB)
image.png (19.4 KiB)
image.png (14.3 KiB)

Hi @arkiboys ,

You are still trying to mark all rows as Upsert If. Hence all rows will get update. Please check below video to understand about Alter row transforamtion.
https://www.youtube.com/watch?v=12Bt9N5lODA

137752-image.png


0 Votes 0 ·
image.png (21.6 KiB)
arkiboys avatar image arkiboys ShaikMaheer-MSFT ·

I think I am beginning to understand this...
Note that my sink is Delta parquet file.
I have watched that video previously and I watched it again now (It shows how to do upsert and delete). (The delete in the video is based on the value which is already present in the source),
My query is to how to detect the rows that were deleted at source and so remove them from sink
Note that I am using sink Delta parquet.

Hope you see what I mean
Thank you

0 Votes 0 ·
Show more comments