question

arkiboys avatar image
0 Votes"
arkiboys asked MartinJaffer-MSFT answered

dataflow - sink delta error

Hello,
In dataflow of the foreach loop in ADF pipeline gives the following error for some of the items in iteration.
some iterations are success but a few give this error below.
Any suggestions how to solve this?
The error seems to be in the sink which is set to delta inline dataset.
sink has column1 to column10
before this sink is alter row set to be:
deleteif --> isnull(column1)
upsertif --> true()

sink settings
delta inline dataset.
folderpath --> xyz
list of columns --> column1, column2, column3, column4

Message":"Job failed due to reason: at Sink 'sink1': com.databricks.sql.transaction.tahoe.ProtocolChangedException: The protocol version of the Delta table has been changed by a concurrent update. Please try the operation again.\nConflicting commit: {\"version\":0,\"timestamp\":1650658266180,\"operation\":\"WRITE\",\"operationParameters\":{\"mode\":Append,\"partitionBy\":[]},\"
...

azure-data-factory
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

MartinJaffer-MSFT avatar image
0 Votes"
MartinJaffer-MSFT answered

Hi again @arkiboys

I noticed your upsertif --> true() would mark ALL rows for upsert. Is this intended?

This error sounds like two instances are trying to write to the same version of the Delta table at the same time, and making a conflict. There is an explanation in this article
under the section titled Solving Conflicts Optimistically . If I understand everythign correctly, most of the time you won't see this error. Only when the conflict can't be resolved would you see this.

Excerpt:

However, in the event that there’s an irreconcilable problem that Delta Lake cannot solve optimistically (for example, if User 1 deleted a file that User 2 also deleted), the only option is to throw an error.

So maybe the two instances have nulls in the same row, causing double-delete?


· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

hi,

my intention for the upsert is that if any of the rows has any of the columns updated in source then update it in sink
And insert any new rows coming from source and not already present in sink.
Therefore I have upsert --> true()

Is that correct?

0 Votes 0 ·