question

Loky-5050 avatar image
0 Votes"
Loky-5050 asked Loky-5050 commented

Azure Databricks | Scala on High Concurrency Cluster Mode

We are using scala notebooks with high concurrency clusters.
So far what we have read is that Scala notebooks would not benefit from high concurrency clusters, is it true?

These clusters are supposed to be shared with various other teams.



220189-2022-07-13-11-03-52.jpg?


azure-databricksdotnet-ml-big-data
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

HimanshuSinha-MSFT avatar image
0 Votes"
HimanshuSinha-MSFT answered Loky-5050 commented

Hello @Loky-5050,
Thanks for the question and using MS Q&A platform.

As we understand the ask here is that Scala notebooks would not benefit from high concurrency clusters ? please do let us know if its not accurate.
You understanding this is correct , Scala does not benefit from high concurrency clusters . This is called out here : https://docs.microsoft.com/en-us/azure/databricks/clusters/configure#--high-concurrency-clusters

The reason being that in the high concurrent mode the cluster needs to run the workload of different users , but Scala code will be executed inside the Spark JVM (per machine) that is shared between all users, so you can get access to everything that is inside JVM.

Please do let me if you have any queries.
Thanks
Himanshu


  • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how

  • Want a reminder to come back and check responses? Here is how to subscribe to a notification

  • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators


· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thanks, Himanshu,
just try to unpack this for me.

if the same notebook is executed by different teams/users, one of it has to wait while the other is executing. Is my understanding correct?

0 Votes 0 ·

Hi @Loky-5050,

To answer your last question, sorry, that's not correct. One of the feature high concurrency provide is process isolation, so when a notebook is executed by different users, this process are isolated from each other. Unfortunately, Scala will be executed inside the Spark JVM which is shared with other users in the cluster and process isolation is not possible with Scala and that's why it's not supported. I don't think running the notebook parallel by multiple users will be executed sequentially, they use the same JVM but this shouldn't be a problem.

Mark as accepted answer if it helps!

0 Votes 0 ·

Thanks, but this all still looks fuzzy to me.

I am trying to understand is if we are using scala notebooks on high concurrency clusters, what are we losing out and how (since it is documented)?

If you are saying "shouldn't be a problem" then it appears that there will be no impact of executing scala notebooks on high concurrency clusters?

0 Votes 0 ·