question

EvansMatt-0925 avatar image
0 Votes"
EvansMatt-0925 asked EvansMatt-0925 edited

Azure Data Factory - Copy Activity - Handling failed rows

I am involved in a Data Migration project where we are moving data from SQL Table(s) source into Dynamics CRM.

We are currently using Kingswaysoft toolkit within SSIS packages to migrate the data using the Dynamics CRM connectors, but this requires us to hand build separate pipelines for each of the Dynamics entities we wish to migrate data into.

I would like to use ADF and reusable pipelines to achieve this ideally. I have been building a POC reusable pipeline to copy the data between SQL tables & Dynamics CRM entities but I've hit a bit of a stumbling block with regards to the functionality of using Copy Activity to migrate the data.

The issue I have is with regards to how I can robustly handle records that fail to load into Dynamics for one reason or another (eg they violate key, invalid datatype/lengths etc) Obviously this is fairly basic requirement for any data migration.

I have logging enabled on the Copy Activity which logs failed rows into randomly generated logfiles, but its not terribly usable as the entire failed record is shoved in a single field with a strange number of quotes surrounding it and no way of identifying attribute names. This is pretty useless to be able to do anything with from a pipeline point of view.

76763-capture.jpg

What we'd like to do, is similar to what we do currently with KWS, off the back of the copy task we can we can simply update the records in the SQL table as having been loaded successfully or errored. This simply allows us to restage the data (fixing bad data at source where possible) and re run the pipeline which will only look for records which haven't yet been successfully loaded or have changed since the previous successful load.

This is a fairly simple requirement but I can't fathom a way to do this within ADF.

So I guess the question is if you have 1 million records in a table, 900k load successfully, 100k fail, how can you design the pipeline to mark those 100k records as having failed (or conversely mark the 900k that were successful as such).


Copy Task has a redirectRowPath output value which i thought used to log failed records to that path (keeping the data structure with attribute names) but it appears that's no longer in use? well not for DynamicsCRM sink anyway.

76796-capture2.jpg

Many Thanks

azure-data-factory
capture2.jpg (49.0 KiB)
capture.jpg (39.0 KiB)
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@evansmatt-0925 We are currently looking into this and update you back on the same.

0 Votes 0 ·
HimanshuSinha-MSFT avatar image
0 Votes"
HimanshuSinha-MSFT answered

Hello @EvansMatt-0925 ,
Thanks for the ask and using the Microsoft Q&A platform .
I did dig into this a bit more and I think you can use the dynamic expression to get the
@activity('Copy data1').output.logFilePath . This is what i tried created a destination table with the datalenth less then tha of source , the idea was to trigger a failure .


CREATE TABLE T1
(
ID INT ,
NAME VARCHAR(10)
)

INSERT into t1 values(1,'Evan')
INSERT into t1 values(2,'John')
INSERT into t1 values(3,'James')

sink

CREATE TABLE T2
(
ID INT ,
NAME VARCHAR*(2)*

)

Added a copy activity to copy data from source sink and enabled fault tolerance .Create a variable to capture the log file path ( this is not required but I kept to clarity )

Added one more copy activity , the intend here is to copy the date from the blob to a SQL logging . I used the dataset parameter and passed the value of the log file . Below animation should make that more clear ,

I agree that the data in SQL is not in perfect format , but its something i was able to work around

select * from dbo.logging
where operationItem is not null

Animation : https://imgur.com/a/EFCoii6

Please do let me know how it goes .
Thanks
Himanshu
Please do consider to click on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members




5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

EvansMatt-0925 avatar image
0 Votes"
EvansMatt-0925 answered EvansMatt-0925 edited

Thank you for the reply,

I had got to a similar position to what you've just demonstrated, the issue is that there's no way to accurately interrogate the data within the "operationItem" to retrieve the relevant record Id I'd need to update the original staged record to say its failed to load as there's no headings and the Id field may not be consistently returned in the same place for each table ie i can't always go and get the first field for example.

In a reusable pipeline using a foreachloop where the copy activity will copy data between between many different source/target tables by passing in the different table information in the loop there is no way to interrogate that "operationItem" field for each of the tables as there's no way to be certain what the fields are.

What i think would be more useful is that the log/error file should really just be the failed row(s) (with column headings in first row) with additional columns added on the end such as "level","error message" etc.

In Kingswaysoft on the sink connection in the dataflowtask, effectively any error rows flow out with the same format they went in, with these additional columns on the end indicating what the failure was, so these can be logged to a file/table with the correct structure (with attributes identifiable) and therefore a follow on activity to query this table and mark failed records that exist is really simple to do.

I don't think this is a particularly niche requirement, this is a very common requirement for all data movement activities. You need a robust way to deal with failed records and at the minute that seems to be a major shortcoming with ADF Copy Activity task.

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thanks @EvansMatt-0925,

Apoloziges for replying late on this . I agree that while I was testing the work around solution with one table its will work fine , but if we have multiple tables thinks will be a bit difficult . Can I request you to please log the ask here .

Product group does monitor the request and they can plan for the implementation in future .Once you log the feature request you will also be notified on the status of the request . Please do share the url of the logged issue here so that other community members are aware of the same .We expect you to keep using this forum and motivate others to do the same . You can always help other community members by answering to their queries .



Thanks
Himanshu

0 Votes 0 ·
EvansMatt-0925 avatar image EvansMatt-0925 HimanshuSinha-MSFT ·

Also the the format of of the data recorded in the OperationItem with its weird double double quoting means it can't even be imported into a logging table as you demonstrated as as you can't define the CSV dataset to even successfully read the files

example logging file below

82166-capture3.jpg

82125-example-copy-activity-log.txt


0 Votes 0 ·