Azure Data Factory - Copy Activity - Handling failed rows

Evans, Matt 1 Reputation point
2021-03-11T12:17:23.497+00:00

I am involved in a Data Migration project where we are moving data from SQL Table(s) source into Dynamics CRM.

We are currently using Kingswaysoft toolkit within SSIS packages to migrate the data using the Dynamics CRM connectors, but this requires us to hand build separate pipelines for each of the Dynamics entities we wish to migrate data into.

I would like to use ADF and reusable pipelines to achieve this ideally. I have been building a POC reusable pipeline to copy the data between SQL tables & Dynamics CRM entities but I've hit a bit of a stumbling block with regards to the functionality of using Copy Activity to migrate the data.

The issue I have is with regards to how I can robustly handle records that fail to load into Dynamics for one reason or another (eg they violate key, invalid datatype/lengths etc) Obviously this is fairly basic requirement for any data migration.

I have logging enabled on the Copy Activity which logs failed rows into randomly generated logfiles, but its not terribly usable as the entire failed record is shoved in a single field with a strange number of quotes surrounding it and no way of identifying attribute names. This is pretty useless to be able to do anything with from a pipeline point of view.

76763-capture.jpg

What we'd like to do, is similar to what we do currently with KWS, off the back of the copy task we can we can simply update the records in the SQL table as having been loaded successfully or errored. This simply allows us to restage the data (fixing bad data at source where possible) and re run the pipeline which will only look for records which haven't yet been successfully loaded or have changed since the previous successful load.

This is a fairly simple requirement but I can't fathom a way to do this within ADF.

So I guess the question is if you have 1 million records in a table, 900k load successfully, 100k fail, how can you design the pipeline to mark those 100k records as having failed (or conversely mark the 900k that were successful as such).

Copy Task has a redirectRowPath output value which i thought used to log failed records to that path (keeping the data structure with attribute names) but it appears that's no longer in use? well not for DynamicsCRM sink anyway.

76796-capture2.jpg

Many Thanks

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,472 questions
{count} votes

3 answers

Sort by: Most helpful
  1. HimanshuSinha-msft 19,376 Reputation points Microsoft Employee
    2021-03-19T00:27:08.357+00:00

    Hello @Evans, Matt ,
    Thanks for the ask and using the Microsoft Q&A platform .
    I did dig into this a bit more and I think you can use the dynamic expression to get the
    @activity('Copy data1').output.logFilePath . This is what i tried created a destination table with the datalenth less then tha of source , the idea was to trigger a failure .

    CREATE TABLE T1
    (
    ID INT ,
    NAME VARCHAR(10)
    )

    INSERT into t1 values(1,'Evan')
    INSERT into t1 values(2,'John')
    INSERT into t1 values(3,'James')

    sink

    CREATE TABLE T2
    (
    ID INT ,
    NAME VARCHAR**(2)**

    )

    Added a copy activity to copy data from source sink and enabled fault tolerance .Create a variable to capture the log file path ( this is not required but I kept to clarity )

    Added one more copy activity , the intend here is to copy the date from the blob to a SQL logging . I used the dataset parameter and passed the value of the log file . Below animation should make that more clear ,

    I agree that the data in SQL is not in perfect format , but its something i was able to work around

    select * from dbo.logging
    where operationItem is not null

    Animation : https://imgur.com/a/EFCoii6

    Please do let me know how it goes .
    Thanks
    Himanshu
    Please do consider to click on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members

    0 comments No comments

  2. Evans, Matt 1 Reputation point
    2021-03-19T12:17:43.523+00:00

    Thank you for the reply,

    I had got to a similar position to what you've just demonstrated, the issue is that there's no way to accurately interrogate the data within the "operationItem" to retrieve the relevant record Id I'd need to update the original staged record to say its failed to load as there's no headings and the Id field may not be consistently returned in the same place for each table ie i can't always go and get the first field for example.

    In a reusable pipeline using a foreachloop where the copy activity will copy data between between many different source/target tables by passing in the different table information in the loop there is no way to interrogate that "operationItem" field for each of the tables as there's no way to be certain what the fields are.

    What i think would be more useful is that the log/error file should really just be the failed row(s) (with column headings in first row) with additional columns added on the end such as "level","error message" etc.

    In Kingswaysoft on the sink connection in the dataflowtask, effectively any error rows flow out with the same format they went in, with these additional columns on the end indicating what the failure was, so these can be logged to a file/table with the correct structure (with attributes identifiable) and therefore a follow on activity to query this table and mark failed records that exist is really simple to do.

    I don't think this is a particularly niche requirement, this is a very common requirement for all data movement activities. You need a robust way to deal with failed records and at the minute that seems to be a major shortcoming with ADF Copy Activity task.


  3. Karthiga Murugan 1 Reputation point
    2021-11-03T06:29:36.25+00:00

    Hi EvansMatt,

    I am also looking for the solution for the same problem statement. Did you get any solution? Can you let me know?

    Thanks in advance,
    Karthiga

    0 comments No comments