How to reduce unnecessary high memory usage in a Databricks cluster?

Senad Hadzikic 20 Reputation points
2024-05-08T08:58:46.4433333+00:00

We are having unnecessary high memory usage even when nothing is running on the cluster. When the cluster first starts, it's fine, but when I run a script and it finishes executing, nothing gets back to the idle (initial) state (even hours after nothing else was executed).

Screenshot 2024-05-08 at 10.53.08

Cluster config:
Screenshot 2024-05-08 at 10.56.09

Some settings i tried:
Screenshot 2024-05-08 at 10.56.41

Spark Config:
spark.executor.extraJavaOptions -XX:+UseG1GC -XX:MaxGCPauseMillis=500 -XX:ParallelGCThreads=20 -XX:ConcGCThreads=5 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:G1HeapRegionSize=8M spark.memory.storageFraction 0.5 spark.dynamicAllocation.maxExecutors 10 spark.driver.extraJavaOptions -XX:+UseG1GC -XX:MaxGCPauseMillis=500 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=10M -Xloggc:/databricks/driver/logs/gc.log -XX:G1HeapRegionSize=8M -XX:+ExplicitGCInvokesConcurrent spark.dynamicAllocation.enabled true spark.memory.fraction 0.6 spark.dynamicAllocation.minExecutors 1

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,955 questions
{count} votes