question

CaristinaJose-2736 avatar image
0 Votes"
CaristinaJose-2736 asked IanPosner answered

Importing parquet files with SSIS

Hi all, I need to import a parquet file with SSIS. I read several forums and it seems it is not possible. I wouldn`t need Azure-related answers since I know it is possible with ADF.
Does anyone know how to do this with SSIS? I am even interested in programmatic ways (C# or .NET) through SSIS.
Any ideas will be much appreciated. Thank you all!

sql-server-integration-services
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

IgorGelin-0063 avatar image
0 Votes"
IgorGelin-0063 answered

You can use SSIS Script Task to process a parquet file.
Below is an example of C# code to convert a parquet file.

https://stackoverflow.com/questions/62094616/how-to-convert-parquet-file-to-csv-using-net-core

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

ZoeHui-MSFT avatar image
0 Votes"
ZoeHui-MSFT answered

Hi anonymous user,

I did not find a good way to import parquet files in SSIS without ADF.

You may refer the link IgorGelin-0063 provided to see if it is useful.

Regards,

Zoe


If the answer is helpful, please click "Accept Answer" and upvote it.

Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.
Hot issues October

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

CaristinaJose-2736 avatar image
0 Votes"
CaristinaJose-2736 answered

Thank you very much @IgorGelin-0063 and @Zoehui-MSFT for you kind replies. I will try to follow @IgorGelin-0063 's guidance and get back to this conversation with an answer. If anyone else has some insights on this matter, please continue providing answers. Thank you both again!

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

CaristinaJose-2736 avatar image
0 Votes"
CaristinaJose-2736 answered

Hi @IgorGelin-0063, I went through the advised documentation and it is about converting parquet files into csv files using Cinchoo ETL library. I read Cinchoo ETL's documentation and it doesn`t seem to work with SQL, it converts json to csv or Parquet to csv. I would need a way of loading Parquet files in SQL Server tables through SSIS. I apologize if I am misunderstanding how to use Cinchoo ETL framework.
Thanks again!

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

IanPosner avatar image
0 Votes"
IanPosner answered

The reason ADF supports Parquet is that the engine is based upon Spark, which uses Parquet as its intermediate storage format. It does so because Parquet supports partitioning and is designed for use on the HDFS file system which will distribute 256MB blocks of data to different processing nodes for parallel processing. Since these 256MB blocks represent compressed data, the underlying raw size of this data is likely to be 1-2.5GB per block.

Therefore you should ask yourself whether the raw data you hold in Parquet files is large enough to justify the Parquet format.

If the parquet files are not several multiples of 256MB in size, then it is likely that the file format is inappropriate for the volume of data. In this case, consider converting the data to a supported format before using SSIS. As a rule, SSIS can usually process 50,000-100,000 rows per second for a single non-blocking dataflow with a startup time of 2-3 seconds. So you should be able to estimate how long an SSIS package should take to process the number of rows you have per file.

Another option you have is to either write a custom SSIS source task or to purchase a 3rd party parquet file source.

You should compare SSIS with ADF, which may take between 30-60 seconds to start up and is really suited to files of 1GB+ in size, processing large parquet files in parallel.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.