Problem: Apache Spark Job Fails With
A Spark job fails with a
org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of XXXX tasks (X.0 GB) is bigger than spark.driver.maxResultSize (X.0 GB)
This error occurs because the configured size limit was exceeded. The size limit applies to the total serialized results for Spark actions across all partitions. The Spark actions include actions such as
collect() to the driver node,
toPandas(), or saving a large file to the driver local file system.
In some situations, you might have to refactor the code to prevent the driver node from collecting a large amount of data. You can change the code so that the driver node collects a limited amount of data or increase the driver instance memory size. For example you can call
toPandas with Arrow enabled or writing files and then read those files instead of collecting large amounts of data back to the driver.
If absolutely necessary you can set the property
spark.driver.maxResultSize to a value
<X>g higher than the value reported in the exception message in the cluster Spark configuration:
The default value is 4g. For details, see Application Properties
If you set a high limit, out-of-memory errors can occur in the driver (depending on
spark.driver.memory and the memory overhead of objects in the JVM). Set an appropriate limit to prevent out-of-memory errors.