question

UtsavChanda-0290 avatar image
0 Votes"
UtsavChanda-0290 asked SaurabhSharma-msft commented

Data type issue for decimal fields in parquet files between data flow and hive

I am generating parquet files using Data Flow in ADF. The files have some fields with decimal data types.
When I try to create a Hive external table on top of those parquet files, I get the below error while reading the data through Hive.

org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block.

I think this issue is caused because of different parquet conventions used for decimal fields in Hive and Spark.(Data Flow essentially uses Spark only).

I think the issue gets resolved if you are generating parquet using Spark in Databricks. Then you can set Spark.sql.parquet.writeLegacyFormat=true but how to handle same thing in Azure Data Factory Data Flows?

azure-data-factoryazure-databricks
· 6
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @utsavchanda-0290,

Thanks for using Microsoft Q&A !!
I do not see a way to set this property in Dataflow source options but checking internally with the products team if this is possible. I will get back to you on the same.

Thanks
Saurabh

1 Vote 1 ·

Hi @utsavchanda-0290,

I have received a confirmation internally that explicit settings for parquet write with in data flow is not allowed. One way to do is to read back parquet file and process it again for right format for Hive with another data flow. Can you please help providing a detailed use case so that we can try providing any other possible solutions.

Thanks
Saurabh

0 Votes 0 ·

Hi,

We have regular requirements to write out parquet files in Data Lake after some data transformations in Data Flows. The Data Lake is attached to an HdInsight cluster where we also have Hive external tables created on top of the data lake containers. The Hive tables make it easier for quering the data lake.

Now this issue comes up whenever we have any decimal fields in the file structure.

0 Votes 0 ·
Show more comments

0 Answers