question

AlexWong-9440 avatar image
0 Votes"
AlexWong-9440 asked GeorgeMaxson-8540 answered

Azure Data Factory error 2200 writing to parquet file

Hi there, in copying a few sql tables from sql db to datalake using ADF pipeline, got the following error:

{
"errorCode": "2200",
"message": "Failure happened on 'Sink' side. ErrorCode=ParquetJavaInvocationException,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.lang.IllegalArgumentException:field ended by ';': expected ';' but got 'State' at line 0: message adms_schema { optional int32 SystemID; optional binary System (UTF8); optional binary Description (UTF8); optional binary Defined State\ntotal entry:10\r\norg.apache.parquet.schema.MessageTypeParser.check(MessageTypeParser.java:215)\r\norg.apache.parquet.schema.MessageTypeParser.addPrimitiveType(MessageTypeParser.java:188)\r\norg.apache.parquet.schema.MessageTypeParser.addType(MessageTypeParser.java:112)\r\norg.apache.parquet.schema.MessageTypeParser.addGroupTypeFields(MessageTypeParser.java:100)\r\norg.apache.parquet.schema.MessageTypeParser.parse(MessageTypeParser.java:93)\r\norg.apache.parquet.schema.MessageTypeParser.parseMessageType(MessageTypeParser.java:83)\r\ncom.microsoft.datatransfer.bridge.parquet.ParquetWriterBuilderBridge.getSchema(ParquetWriterBuilderBridge.java:187)\r\ncom.microsoft.datatransfer.bridge.parquet.ParquetWriterBuilderBridge.build(ParquetWriterBuilderBridge.java:159)\r\ncom.microsoft.datatransfer.bridge.parquet.ParquetWriterBridge.open(ParquetWriterBridge.java:13)\r\ncom.microsoft.datatransfer.bridge.parquet.ParquetFileBridge.createWriter(ParquetFileBridge.java:27)\r\n.,Source=Microsoft.DataTransfer.Richfile.ParquetTransferPlugin,''Type=Microsoft.DataTransfer.Richfile.JniExt.JavaBridgeException,Message=,Source=Microsoft.DataTransfer.Richfile.HiveOrcBridge,'",
"failureType": "UserError",
"target": "Copy table",
"details": []
}
Any idea?
Thanks,

azure-data-factory
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

AlexWong-9440 avatar image
2 Votes"
AlexWong-9440 answered

Turns out that one column name from the source sql table contains whitespace, seems like parquet parser doesn't like it. Funny that the original sink dataset using csv doesn't seem to have the same issue.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

DaveLePrevost-1006 avatar image
0 Votes"
DaveLePrevost-1006 answered

Is there more details on this issue, as I seem to be getting the same issue. I am trying to do the same from a SQL server into our Azure datalake using, but would like to continue to ingest using parquet

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

dwpvitro avatar image
0 Votes"
dwpvitro answered dwpvitro edited

The parquet writer does not allow white space in column names. If you're using data factory to write parquet, you need to handle removal of whitespace from the column names somehow. One option is to use the column mappings in a copy activity to map the source columns that have whitespace to sink column names without whitespace.

The csv format has no such column name restrictions. That's why it succeeded and the parquet write failed.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

GeorgeMaxson-8540 avatar image
0 Votes"
GeorgeMaxson-8540 answered

White Space is not the only offender for this, so are parenthesis symbols "(" & ")". As noted above, updating the names in the Mapping was how I resolved the issue. My error was:

"ErrorCode=ParquetJavaInvocationException,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.lang.IllegalArgumentException:No enum constant org.apache.parquet.schema.OriginalType.MB\ntotal

The characters after OriginalType. and before \ntotal, which in this case is "MB" are the characters that were in between the parenthesis. Not a very helpful clue unless you know what to look for, so I hope this helps someone.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.