question

Samyak-3746 avatar image
0 Votes"
Samyak-3746 asked SumanthMarigowda-MSFT edited

Data Factory taking suspiciously long time to run a simple python code

Hi,

I have been following this tutorial:
https://docs.microsoft.com/en-us/azure/batch/tutorial-run-python-batch-azure-data-factory

I was able to execute this tutorial successfully earlier but recently, I added ~12 files(all greater than atleast 1Gb), since then I have not been able to run the pipeline successfully and it is taking very long to even run the pipeline.
i had also received an error: "FileIOException: FileIOException('Failed to allocate 635101071 bytes for file D:\\batch\\tasks\\workitems\\adfv2-Analytics_pool\\job-1\\9915a460-5efe-445d-9684-8d72b57b83a3\\wd\\Highmark_p1.parquet.gzip, out of disk space')"
The pool shows "unusable" state.

PS: I have a "standard_d2s_v3" machine in my batch account.

Does data factory load all files in a blob storage folder even when the python script is targeted for a single file?
Asking this because, earlier when i had only "iris.csv" and "main.py" in my input folder, the job succeeded in mere seconds but now when i have added multiple heavy files, it's taking very long. And once i deleted all the files again and kept only the 2 files, it is again taking infinite time.

What is the solution to this? How to know what is the limit of size/number of files that can be loaded to a storage to run efficiently?


azure-data-factoryazure-batch
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

ShaikMaheer-MSFT avatar image
0 Votes"
ShaikMaheer-MSFT answered Samyak-3746 edited

Hi @Samyak-3746 ,

Thank you for posting query in Microsoft Q&A Platform.

Is your Python code not handling each file process at one time? You can have a logic of Python in a such a way that to check if file loaded to blob completely or not. If yes, then only process another file.

Between, Please check here to possible reasons for node is unusable state and see if this helps.

Please let us know how it goes. Thank you.

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @Samyak-3746 - Just checking if get chance to see above and share updates? If you found any other resolution, then please feel free to share so that community can benefit from it. Thank you.

0 Votes 0 ·

Hi @ShaikMaheer-MSFT

I have multiple files in the blob but I am accessing only one through the python script.

But as you can see, I am getting multiple file image in the batch job. And the pool also goes in an unusable state because of the heavy files.198538-screenshot-2022-04-25-at-34645-pm.png


0 Votes 0 ·