Azure Data Factory error 2200 writing to parquet file

Question

Hi there, in copying a few sql tables from sql db to datalake using ADF pipeline, got the following error:

{
"errorCode": "2200",
"message": "Failure happened on 'Sink' side. ErrorCode=ParquetJavaInvocationException,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.lang.IllegalArgumentException:field ended by ';': expected ';' but got 'State' at line 0: message adms_schema { optional int32 SystemID; optional binary System (UTF8); optional binary Description (UTF8); optional binary Defined State total entry:10 org.apache.parquet.schema.MessageTypeParser.check(MessageTypeParser.java:215) org.apache.parquet.schema.MessageTypeParser.addPrimitiveType(MessageTypeParser.java:188) org.apache.parquet.schema.MessageTypeParser.addType(MessageTypeParser.java:112) org.apache.parquet.schema.MessageTypeParser.addGroupTypeFields(MessageTypeParser.java:100) org.apache.parquet.schema.MessageTypeParser.parse(MessageTypeParser.java:93) org.apache.parquet.schema.MessageTypeParser.parseMessageType(MessageTypeParser.java:83) com.microsoft.datatransfer.bridge.parquet.ParquetWriterBuilderBridge.getSchema(ParquetWriterBuilderBridge.java:187) com.microsoft.datatransfer.bridge.parquet.ParquetWriterBuilderBridge.build(ParquetWriterBuilderBridge.java:159) com.microsoft.datatransfer.bridge.parquet.ParquetWriterBridge.open(ParquetWriterBridge.java:13) com.microsoft.datatransfer.bridge.parquet.ParquetFileBridge.createWriter(ParquetFileBridge.java:27) .,Source=Microsoft.DataTransfer.Richfile.ParquetTransferPlugin,''Type=Microsoft.DataTransfer.Richfile.JniExt.JavaBridgeException,Message=,Source=Microsoft.DataTransfer.Richfile.HiveOrcBridge,'",
"failureType": "UserError",
"target": "Copy table",
"details": []
}
Any idea?
Thanks,

Accepted Answer

Turns out that one column name from the source sql table contains whitespace, seems like parquet parser doesn't like it. Funny that the original sink dataset using csv doesn't seem to have the same issue.

Answer

White Space is not the only offender for this, so are parenthesis symbols "(" & ")". As noted above, updating the names in the Mapping was how I resolved the issue. My error was:

"ErrorCode=ParquetJavaInvocationException,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.lang.IllegalArgumentException:No enum constant org.apache.parquet.schema.OriginalType.MB total

The characters after OriginalType. and before total, which in this case is "MB" are the characters that were in between the parenthesis. Not a very helpful clue unless you know what to look for, so I hope this helps someone.

Answer

Is there more details on this issue, as I seem to be getting the same issue. I am trying to do the same from a SQL server into our Azure datalake using, but would like to continue to ingest using parquet

Answer

The parquet writer does not allow white space in column names. If you're using data factory to write parquet, you need to handle removal of whitespace from the column names somehow. One option is to use the column mappings in a copy activity to map the source columns that have whitespace to sink column names without whitespace.

The csv format has no such column name restrictions. That's why it succeeded and the parquet write failed.

Azure Data Factory error 2200 writing to parquet file

3 additional answers