Cluster size distribution

Question

Dear friends I want to understand how based on cluster size what memory is distributed to executor, for example is 7 node and 4 core and it is 250 gb how the distribution happens with each executor and node and how much would be reserved how databricks decides on distribution.help please

Accepted Answer

Hello @Anshal ,

Welcome to the Microsoft Q&A platform.

Azure Databricks worker nodes run the Spark executors and other services required for the proper functioning of the clusters. When you distribute your workload with Spark, all of the distributed processing happens on worker nodes. Azure Databricks runs one executor per worker node; therefore the terms executor and worker are used interchangeably in the context of the Azure Databricks architecture.

By default, the amount of memory available for each executor is allocated within the Java Virtual Machine (JVM) memory heap. This is controlled by the spark.executor.memory property.

However, some unexpected behaviors were observed on instances with a large amount of memory allocated. As JVMs scale up in memory size, issues with the garbage collector become apparent. These issues can be resolved by limiting the amount of memory under garbage collector management.

Selected Azure Databricks cluster types enable the off-heap mode, which limits the amount of memory under garbage collector management. This is why certain Spark clusters have the spark.executor.memory value set to a fraction of the overall cluster memory.

For more details, refer to Apache Spark executor memory allocation and Azure Databricks - configure clusters.

Resource Allocation is an important aspect during the execution of any spark job.

This third-party blog helps to understand the basic flow in a Spark Application and then how to configure the number of executors, memory settings of each executors and the number of cores for a Spark Job.

Disclaimer: This response contains a reference to a third-party World Wide Web site. Microsoft is providing this information as a convenience to you. Microsoft does not control these sites and has not tested any software or information found on these sites; therefore, Microsoft cannot make any representations regarding the quality, safety, or suitability of any software or information found there. There are inherent dangers in the use of any software found on the Internet, and Microsoft cautions you to make sure that you completely understand the risk before retrieving any software from the Internet.

Hope this will help. Please let us know if any further queries.

------------------------------

Please don't forget to click on or upvote button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
Want a reminder to come back and check responses? Here is how to subscribe to a notification
If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

Cluster size distribution

0 additional answers