question

PriyaJha-3992 avatar image
0 Votes"
PriyaJha-3992 asked MartinJaffer-MSFT commented

How to read a CSV file with Line break and Quotes through PolyBase in Synapse

Hi All,

I have an automated source that generates Excel file with data that can contain line breaks or double quotes or some delimiter values.

I am converting this Excel file to CSV using ADF Copy activity and to handle column delimiter in data i have used Quote char property in copy activity.

Now i am reading this CSV file through Polybase and i have mentioned String_Delimiter as Double quotes in my External file format definition.

But i am getting error for 2 scenarios:

1) If the file contains line breaks, this is the error: No Closing String Delimiter
2) If the data also has double quotes: Could not find a delimiter after String Delimiter

How to handle these two scenarios in Polybase.

azure-synapse-analytics
· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @PriyaJha-3992 and thank you for your question.

Could you please share some dummy data to ensure I under stand the situation correctly? I think you are saying your data looks like:

Col1,Col2,Col3<NEWLINE>
abc,"Hello <NEWLINE> World",123<NEWLINE>
"Second Row",4.56,789<NEWLINE>


and then it is being parsed/read as the following, and failing:

Col1 | Col2 | Col3
abc   | Hello
World",123
Second Row | 4.56 | 789
0 Votes 0 ·

Hi @MartinJaffer-MSFT,

Sorry for late reply.

My CSV file looks like this:

Col1 | Col2 | Col3
"abc" | "Hello
World"|"123"
"Second Row" | "4.56" | "789"

In the External file format, i have mentioned Field terminator as Pipe (|) and String Delimiter as Double Quotes (") and it is failing because of line break present after Hello.

0 Votes 0 ·

Sorry for the delay, @PriyaJha-3992.
I forgot to ask, is this the SQL Data Warehouse, or the Synapse workspace?
I am currently checking whether the Data Warehoues Polybase will recognize whether a linebreak is inside a quoted value or not, or if it splits regardless. You may need to escape the newline.

0 Votes 0 ·
Show more comments

1 Answer

MartinJaffer-MSFT avatar image
0 Votes"
MartinJaffer-MSFT answered MartinJaffer-MSFT commented

I have tried, and it seems not to be possible. Polybase does not appear to be compatible with having the row terminator inside of a quoted field.

You can either change the data, such as escaping, or use something else.

Using DataFactory , Polybase works when staging is enabled to accomplish this. Data Factory uses the staging to make the format acceptable to Polybase. I tested, and the above sample came out correct.



@PriyaJha-3992 @NandanHegde-7720

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

If you found other solutions, please share them here with the community.

0 Votes 0 ·