We are using scala notebooks with high concurrency clusters.
So far what we have read is that Scala notebooks would not benefit from high concurrency clusters, is it true?
These clusters are supposed to be shared with various other teams.
?
We are using scala notebooks with high concurrency clusters.
So far what we have read is that Scala notebooks would not benefit from high concurrency clusters, is it true?
These clusters are supposed to be shared with various other teams.
?
Hello @Loky-5050,
Thanks for the question and using MS Q&A platform.
As we understand the ask here is that Scala notebooks would not benefit from high concurrency clusters ? please do let us know if its not accurate.
You understanding this is correct , Scala does not benefit from high concurrency clusters . This is called out here : https://docs.microsoft.com/en-us/azure/databricks/clusters/configure#--high-concurrency-clusters
The reason being that in the high concurrent mode the cluster needs to run the workload of different users , but Scala code will be executed inside the Spark JVM (per machine) that is shared between all users, so you can get access to everything that is inside JVM.
Please do let me if you have any queries.
Thanks
Himanshu
Please don't forget to click on or upvote
button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
Want a reminder to come back and check responses? Here is how to subscribe to a notification
If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
Thanks, Himanshu,
just try to unpack this for me.
if the same notebook is executed by different teams/users, one of it has to wait while the other is executing. Is my understanding correct?
Hi @Loky-5050,
To answer your last question, sorry, that's not correct. One of the feature high concurrency provide is process isolation, so when a notebook is executed by different users, this process are isolated from each other. Unfortunately, Scala will be executed inside the Spark JVM which is shared with other users in the cluster and process isolation is not possible with Scala and that's why it's not supported. I don't think running the notebook parallel by multiple users will be executed sequentially, they use the same JVM but this shouldn't be a problem.
Mark as accepted answer if it helps!
Thanks, but this all still looks fuzzy to me.
I am trying to understand is if we are using scala notebooks on high concurrency clusters, what are we losing out and how (since it is documented)?
If you are saying "shouldn't be a problem" then it appears that there will be no impact of executing scala notebooks on high concurrency clusters?
12 people are following this question.