Azure Data Factory error 2200 writing to parquet file

Alex Wong 111 Reputation points
2020-09-15T17:46:12.167+00:00

Hi there, in copying a few sql tables from sql db to datalake using ADF pipeline, got the following error:

{
"errorCode": "2200",
"message": "Failure happened on 'Sink' side. ErrorCode=ParquetJavaInvocationException,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.lang.IllegalArgumentException:field ended by ';': expected ';' but got 'State' at line 0: message adms_schema { optional int32 SystemID; optional binary System (UTF8); optional binary Description (UTF8); optional binary Defined State\ntotal entry:10\r\norg.apache.parquet.schema.MessageTypeParser.check(MessageTypeParser.java:215)\r\norg.apache.parquet.schema.MessageTypeParser.addPrimitiveType(MessageTypeParser.java:188)\r\norg.apache.parquet.schema.MessageTypeParser.addType(MessageTypeParser.java:112)\r\norg.apache.parquet.schema.MessageTypeParser.addGroupTypeFields(MessageTypeParser.java:100)\r\norg.apache.parquet.schema.MessageTypeParser.parse(MessageTypeParser.java:93)\r\norg.apache.parquet.schema.MessageTypeParser.parseMessageType(MessageTypeParser.java:83)\r\ncom.microsoft.datatransfer.bridge.parquet.ParquetWriterBuilderBridge.getSchema(ParquetWriterBuilderBridge.java:187)\r\ncom.microsoft.datatransfer.bridge.parquet.ParquetWriterBuilderBridge.build(ParquetWriterBuilderBridge.java:159)\r\ncom.microsoft.datatransfer.bridge.parquet.ParquetWriterBridge.open(ParquetWriterBridge.java:13)\r\ncom.microsoft.datatransfer.bridge.parquet.ParquetFileBridge.createWriter(ParquetFileBridge.java:27)\r\n.,Source=Microsoft.DataTransfer.Richfile.ParquetTransferPlugin,''Type=Microsoft.DataTransfer.Richfile.JniExt.JavaBridgeException,Message=,Source=Microsoft.DataTransfer.Richfile.HiveOrcBridge,'",
"failureType": "UserError",
"target": "Copy table",
"details": []
}
Any idea?
Thanks,

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,643 questions
0 comments No comments
{count} votes

Accepted answer
  1. Alex Wong 111 Reputation points
    2020-09-15T19:21:32.24+00:00

    Turns out that one column name from the source sql table contains whitespace, seems like parquet parser doesn't like it. Funny that the original sink dataset using csv doesn't seem to have the same issue.

    3 people found this answer helpful.
    0 comments No comments

3 additional answers

Sort by: Most helpful
  1. George Maxson 6 Reputation points
    2022-05-03T13:20:59.237+00:00

    White Space is not the only offender for this, so are parenthesis symbols "(" & ")". As noted above, updating the names in the Mapping was how I resolved the issue. My error was:

    "ErrorCode=ParquetJavaInvocationException,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.lang.IllegalArgumentException:No enum constant org.apache.parquet.schema.OriginalType.MB\ntotal

    The characters after OriginalType. and before \ntotal, which in this case is "MB" are the characters that were in between the parenthesis. Not a very helpful clue unless you know what to look for, so I hope this helps someone.

    1 person found this answer helpful.
    0 comments No comments

  2. Dave Le Prevost 1 Reputation point
    2020-10-05T12:59:47.957+00:00

    Is there more details on this issue, as I seem to be getting the same issue. I am trying to do the same from a SQL server into our Azure datalake using, but would like to continue to ingest using parquet

    0 comments No comments

  3. dwpvitro 1 Reputation point
    2021-06-30T13:45:06.087+00:00

    The parquet writer does not allow white space in column names. If you're using data factory to write parquet, you need to handle removal of whitespace from the column names somehow. One option is to use the column mappings in a copy activity to map the source columns that have whitespace to sink column names without whitespace.

    The csv format has no such column name restrictions. That's why it succeeded and the parquet write failed.

    0 comments No comments