question

VictorSeifert-1141 avatar image
0 Votes"
VictorSeifert-1141 asked PRADEEPCHEEKATLA-MSFT commented

Speeding up deployment of whl files to Synapse Spark Pools

I was wondering if there is a way to speed up the process of uploading & applying new whl files (=custom built python packages) to Synapse Spark pools.

Uploading whl files to the Synapse Workspace is a matter of seconds (as expected).

But once I select "Packages" on a Spark pool, select the new whl file from "Workspace packages" and click on "Apply" it takes between 10 and 20 minutes before the package is installed and available on the pool.

The pool is not running and no applications are active on the pool before or during the process. The "applying settings"-message mentions that it may take a "couple of minutes" to be applied, but 20 minutes for a single package? This is nuts.... what is Synapse doing in the backend that it takes so long to instal the whl file? Is there any way to speed it up?

SparkPoolConfiguration:
Memory Optimized Node size family
Medium Node size (8vCores / 64 GB)
3-10 nodes
Autoscale=enabled
Automatic pausing=enabled
Numbers of minutes idle=10.
Spark Version 3.1
Python Version 3.8

azure-synapse-analytics
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @VictorSeifert-1141 ,

Welcome to Microsoft Q&A platform and thankyou for posting your query.

I have reached out to our internal product team to get insights on the cause of slowness in deployment of whl files to Synapse Spark Pools. I will keep you posted once I hear back from them . Thankyou for your patience !

1 Vote 1 ·

1 Answer

AnnuKumari-MSFT avatar image
1 Vote"
AnnuKumari-MSFT answered PRADEEPCHEEKATLA-MSFT commented

Hi @VictorSeifert-1141 ,
We got the response from the Product team . Please have a look:

Regarding The waiting time of “apply changes”:
The waiting user experienced is the time of Library Management Module to get all the package ready for the cluster to use. The package preparation process itself is also a spark job, so it would take some time for the spark job to queue and execute. The job itself will download all required libraries, check conflicts and so on. The key takeaway is that the library update process doesn’t work as an “incremental” way, i.e., it’s not just taking an extra package into some directory, it’s a full process of submitting an updating spark job, downloading everything, and checking the dependencies to make sure the packages are ready to use in clusters.

For the specific ask here about the pool level library update waiting time, we are working on it. But the pool level update is a heavy process like I said before, it will influence all sessions that attach to this spark pool. And my question here is what’s the purpose of user attaching the wheel file to spark pool? Does he/she want the wheel file to be usable for entire spark pool? Or does he/she want to do quick iteration and use the functions in that wheel? If the purpose goes into the second one, we can continue our discussion.

· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @VictorSeifert-1141,

Following up to see if the above suggestion was helpful. And, if you have any further query do let us know.

0 Votes 0 ·
VictorSeifert-1141 avatar image VictorSeifert-1141 PRADEEPCHEEKATLA-MSFT ·

Hi, thanks for the explanation!

I was on holiday the last two weeks, hence the late answer.

So as far as I understand it, there is no way to speed it up - pity, but we will have to work with it.

Yes, we need the wheel file to be usable for entire spark pool, as the package is being used in multiple pipelines which all need the newest version of it.

Best regards,
Victor

0 Votes 0 ·

Hi @VictorSeifert-1141,

Now you can accept it as answer. This can be beneficial to other community members. Thank you.

0 Votes 0 ·