question

AzureHero avatar image
3 Votes"
AzureHero asked AlexeySadovoy-8059 commented

Azure Data Factory (ADF) Escape Character Issue When Quoting

I've got a pipeline built in Azure Data Factory that's pulling data from SQL and storing it as csv. My Dataset for the csv has Escape Character set to a blackslash (\) and Quote Character set to Double Quote ("). The problem this causes is when I have an empty string with just a backslash (or at the end of a string value), which exists in my data quite a bit. Dozens of tables and columns.

When this happens the backslash at the end of the string (example: "this is my test\" can't be interpreted by other systems to load. The quote after the blackslash is ignored because it's being escaped and throws off the columns. I have tried using other characters such as "^" to escape but I also have "^" in many values across many table columns and the same issue can happen.

It seems this is an issue with ADF and whenever an escape character exists in data that is at the end of a value (or the only value) it will cause an error. How can I handle this without having to address it on the source side? Why doesn't data Factory convert escape characters to double backslash when it's in data? I would expect a backslash in a string value to get converted to "\\" even when quoting but it isn't happening. It only happens if you don't use quoting.

Thanks for the help!






azure-data-factoryazure-data-lake-storage
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

HimanshuSinha-MSFT avatar image
0 Votes"
HimanshuSinha-MSFT answered HimanshuSinha-MSFT edited

Hello ,
Thanks for the question and also for using this forum.

I did tried to repro the issue on my side . This is what i did , created a source table and inserted the dummy data which you suggested , created a pipeline with copy activity with a csv as the sink . Attached one more copy activity and the idea was to copy the records in the sink file to SQL to destination table . I was expecting it to fail as you mentioned . Unfortunately i was unable to repro and it successed just fine .

 DROP TABLE TEST10052020 
 CREATE TABLE TEST10052020 
 (
 COL VARCHAR(MAX),COL1 VARCHAR(MAX),COL2 VARCHAR(MAX)
 )
    
 DROP TABLE TESTDistination
 CREATE TABLE TESTDestination 
 (
 COL VARCHAR(MAX),COL1 VARCHAR(MAX),COL2 VARCHAR(MAX)
 )
    
    
 insert into TEST10052020 values ('"this is my test\"','sometest','"\"')
    
 select COL,COL1,COL2 from TEST10052020 
 select COL,COL1,COL2 from TESTDestination

30281-escapecharaterissue.gif

When you say that "can't be interpreted by other systems to load" , what kind system are you refering to ? At least with this test , i think data factory is handling this pretty well .

One more suggestion is may be you can explore the power of TSQL ( since I am not aware of the data size and other parameters so may be it does not work out for you ) . When you configure the source you can opt from TSQL and clean the data , something like ...

 SELECT REPLACE(COL,'\',' ')
       ,COL1
    ,REPLACE(COL2,'\',' ')  
   FROM TEST10052020


Thanks & stay safe
Himanshu
Please do consider to click on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members





5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

AzureHero avatar image
0 Votes"
AzureHero answered AlexeySadovoy-8059 commented

Thank you for the response, @HimanshuSinha-MSFT . Your example works because you have wrapped the values in quotes within the SQL table column. If you strip the quotes and instead do:

  insert into TEST10052020 values ('this is my test\','sometest','\')

You will see the output will cause issues in 3rd party systems trying to read the data. The value '\' is output in the file as "\". But because Backslash is the escape character, a 3rd party system will escape the second quote and it will fail. I tried using the exact data and changed the output file to json and the value was correctly output as "\\" within the field.

I tested trying to convert the output delimitedtext file to json and it doesn't work correctly. I have 2 copy activities, the first with a Source of SQL and Sink as DelimitedText. The second has a Source of DelimitedText (for the same file I created in the first activity) and a Sink of JSON. The second out file in JSON has incorrect data because it does not read the DelimtedText file correctly.

· 7
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello ,
Thanks @AzureHero for the details .
Yes i am able to see the issue now . I did setup pipeline the as you did Sql to CSV and then CSV to Json .
With all default escape character setup . the JSON is broken , something like below .

 {"COL":"this is my test\",sometest\"","COL1":"\"\r\n","COL2":null}

I did tweaked the escape character to $
and it worked .

 {"COL":"this is my test\\","COL1":"sometest","COL2":"\\"}

I am not sure if you can use $ , so i tried something more complex '#$%' and it does work also ,

 {"COL":"this is my test\\","COL1":"sometest","COL2":"\\"}

Please do let me know if this helps .


Thanks Himanshu
Please do consider to click on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial

0 Votes 0 ·

Hello @AzureHero ,

We have not heard back from you on this and was just following up . . Incase if you have resolution , request you to share the same here , so that other community members can benefit from that .

Thanks
Himanshu

0 Votes 0 ·

Hello @AzureHero ,

We have not heard back from you on this yet . . Incase if you have resolution , request you to share the same here , so that other community members can benefit from that .

Thanks
Himanshu

0 Votes 0 ·
Show more comments