question

UtsavChanda-0290 avatar image
0 Votes"
UtsavChanda-0290 asked ShaikMaheer-MSFT commented

Facing issue while copying parquet files in a partitioned ADLS Gen1 folder structure through ADF Data Flow

This is ADLS Gen 1 to ADLS Gen 1 copy.
The source folder structure is main/device/supplies/year=2021/month=202109/day=20210903/abcd.parquet
There are multiple parquet files in different year, month and day folders partitioned by year, month and day
I need to copy all the parquet files to Sink and have them in similarly partitioned folders.
In the Source Parquet dataset, I've put the top level container only i.e. main in the directory box and kept the file part empty
129072-dataset.png


I've kept the Data Flow "Source Options" as below:

128995-image.png


But Data Preview is not able to detect the files

129092-image.png


Could you please help



My usecase is similar to what is explained here https://www.youtube.com/watch?v=7Q-db4Qgc4M

azure-data-factory
dataset.png (7.0 KiB)
image.png (56.8 KiB)
image.png (38.7 KiB)
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @UtsavChanda-0290 ,

I remember that I had an issue like this that was caused by spaces in the name of the data flow and/or spaces in the name of the data flow activity in the pipeline. Can you make sure that there are no spaces in the activity and the name of the data flow?

If this doesn't help can you then provide information of your data source.

0 Votes 0 ·

@ThomasBoersma thank you for attention to this. I did have a space in the name of the activity, removed the space but that did not resolve the issue. Checked there are no spaces in the name of the data flow.
The problem seems to be with the wildcard path and it is not able to resolve the folder structure.

My source data is in ADLS Gen1 in the form of parquet files organized in the folder structure main/device/supplies/year=2021/month=202109/day=20210903/abcd.parquet
There are multiple parquet files in different year, month and day folders partitioned by year, month and day.

There is no problem with the linked service because if I hardcode a particular file in the source dataset, the data preview is able to read it fine. That makes me suspect even more that the issue is with the wildcard path.

0 Votes 0 ·

1 Answer

ShaikMaheer-MSFT avatar image
1 Vote"
ShaikMaheer-MSFT answered ShaikMaheer-MSFT commented

Hi @UtsavChanda-0290 ,

Thank you for posting query in Microsoft Q&A Platform.

I tried same and able to repro your issue. I would suggest you to consider including your root directly folder name "main" also in wild card path to avoid issue.

Please check below screenshot. I tried with his configuration and its working fine.

129520-image.png

Hope this will help. Please let us know if any further queries. Thank you.


  • Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.

  • Want a reminder to come back and check responses? Here is how to subscribe to a notification.


image.png (65.3 KiB)
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@ShaikMaheer-MSFT I had already figured out that solution and the solution is exactly what you mentioned. Thank you for your attention to this and for providing the correct solution.

0 Votes 0 ·

Hi @UtsavChanda-0290 ,

Glad to know that your issue resolved. Thank you for accepting answer.

0 Votes 0 ·