question

JeffCopeland-7284 avatar image
0 Votes"
JeffCopeland-7284 asked HimanshuSinha-MSFT edited

Azure Data Factory Cosmos db updating changed documents.

I am using Azure Data Factory...I have two Cosmos dbs. One is our staging container and the other is our production container.
I want to create a pipeline that will look at our staging documents and compare them to our production documents.
1) If document (matid) is in staging and not in production, then we add the document to production (updating our LastUpdate date field to current UTC())
2) If document (matid) is in staging and IS in production, then we want to update that document in production and set the LastUpdate date field to current UTC().

So far I am trying this and the UPSERT seems to touch all documents. The issue is that since we are using json documents, the structure of the documents may change so I can't setup the mapping as defined, it has to be able to handle this.

azure-data-factory
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

MarkKromer-MSFT avatar image
0 Votes"
MarkKromer-MSFT answered HimanshuSinha-MSFT edited

You should be able to accomplish this with data flows in ADF

https://www.youtube.com/watch?v=plp1etT2ftY
https://www.youtube.com/watch?v=wAFLppJNHZc

You can define the shape of your documents with Derived Columns and Flatten transformations

https://www.youtube.com/watch?v=oAEh21NFgWQ

· 11
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thanks will check those out.
Another thing I am noticing...when I do a derived column. In the Derived Column activity, the field looks fine in the Data Preview but when I update the Cosmos db document it converts it to a number...
I am trying to update a LastUpdated field. It is currently "2021-01-04T17:26:07.313"
My derived column is this:
lastUpdated = toDate(currentUTC())...
when it write the new derived value it turns it into 1610064000000.

54749-image.png


0 Votes 0 ·
image.png (10.1 KiB)

You need to format it to your desired date/time format, i.e.:

toString(currentUTC(),'yyyy-MM-dd\'T\'HH:mm:ss.SSS')

0 Votes 0 ·

Thanks! I will try that.

0 Votes 0 ·

That worked. Thanks
Now here is the big issue for me.
with my two containers (Staging and Production). When documents get put into Staging, I don't know if they are changes or additions that will need to be moved to Production. How do I compare two json documents to see if there are differences? Obviously I can do a does exists and if not, I do an insert into Production, but what if there are potential differences? I think (honestly have tried so many options) the Upsert ended up touching all documents on their _ts date, etc..
THanks

0 Votes 0 ·

Were you able to solve this problem @JeffCopeland-7284 ?

0 Votes 0 ·

were you able to solve it?

0 Votes 0 ·

I want to:
Compare Staging json doc (MatId is key) against Production json doc (MatId is key)
IF Staging.MatId = Production.MatId
AND the json is different, update Production (along with setting the LastUpdate) field with current date.
Otherwise, Insert the Staging document into Production.

0 Votes 0 ·

also, since these are json documents, I can't 'hard-code' the mappings, the format of the document may change.

0 Votes 0 ·
Show more comments