Single Node clusters

A Single Node cluster is a cluster consisting of an Apache Spark driver and no Spark workers. A Single Node cluster supports Spark jobs and all Spark data sources, including Delta Lake. A Standard cluster requires a minimum of one Spark worker to run Spark jobs.

Single Node clusters are helpful for:

  • Single-node machine learning workloads that use Spark to load and save data
  • Lightweight exploratory data analysis

Create a Single Node cluster

To create a Single Node cluster, set Cluster Mode to Single Node when you configure a cluster.

Single Node cluster mode

Single Node cluster properties

A Single Node cluster has the following properties:

  • Runs Spark locally.
  • The driver acts as both master and worker, with no worker nodes.
  • Spawns one executor thread per logical core in the cluster, minus 1 core for the driver.
  • All stderr, stdout, and log4j log output is saved in the driver log.
  • A Single Node cluster can’t be converted to a Standard cluster. To use a Standard cluster, create the cluster and attach your notebook to it.

Limitations

  • Large-scale data processing will exhaust the resources on a Single Node cluster. For these workloads, Databricks recommends using a Standard mode cluster.

  • Single Node clusters are not designed to be shared. To avoid resource conflicts, Databricks recommends using a Standard mode cluster when the cluster must be shared.

  • A Standard mode cluster can’t be scaled to 0 workers. Use a Single Node cluster instead.

  • Single Node clusters are not compatible with process isolation.

  • GPU scheduling is not enabled on Single Node clusters.

  • On Single Node clusters, Spark cannot read Parquet files with a UDT column. The following error message results:

    The Spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached.
    

    To work around this problem, disable the native Parquet reader:

    spark.conf.set("spark.databricks.io.parquet.nativeReader.enabled", False)
    

REST API

You can use the Clusters API to create a Single Node cluster.

Single Node cluster policy

Cluster policies simplify cluster configuration for Single Node clusters.

Consider the example of a data science team whose members do not have permission to create clusters. An admin can create a cluster policy that authorizes team members to create a maximum number of Single Node clusters, using pools and cluster policies:

  1. Create a pool:

    1. Set Max capacity to 10.
    2. In Autopilot options, enable autoscaling enabled for local storage.
    3. Set Instance type to Single Node cluster.
    4. Select a Azure Databricks version. Databricks recommends using the latest version if possible.
    5. Click Create.

    The pool’s properties page appears. Make a note of the pool ID and instance type ID page for the newly-created pool.

  2. Create a cluster policy:

    • Set the pool ID and instance type ID from the pool properties from the pool.
    • Specify constraints as needed.
  3. Grant the cluster policy to the team members. You can use Manage users and groups to simplify user management.

    {
      "spark_conf.spark.databricks.cluster.profile": {
        "type": "fixed",
        "value": "singleNode",
        "hidden": true
      },
      "instance_pool_id": {
        "type": "fixed",
        "value": "singleNodePoolId1",
        "hidden": true
      },
      "spark_version": {
        "type": "fixed",
        "value": "7.3.x-cpu-ml-scala2.12",
        "hidden": true
      },
      "autotermination_minutes": {
        "type": "fixed",
        "value": 120,
        "hidden": true
      },
      "num_workers": {
        "type": "fixed",
        "value": 0,
        "hidden": true
      },
      "docker_image.url": {
        "type": "forbidden",
        "hidden": true
      }
    }
    

Single Node job cluster policy

To set up a cluster policy for jobs, you can define a similar cluster policy. Set the cluster_type.type to fixed and cluster_type.value to job. Remove all references to auto_termination_minutes.

{
  "cluster_type": {
    "type": "fixed",
    "value": "job"
  },
  "spark_conf.spark.databricks.cluster.profile": {
    "type": "fixed",
    "value": "singleNode",
    "hidden": true
  },
  "instance_pool_id": {
    "type": "fixed",
    "value": "singleNodePoolId1",
    "hidden": true
  },
  "num_workers": {
    "type": "fixed",
    "value": 0,
    "hidden": true
  },
  "spark_version": {
    "type": "fixed",
    "value": "7.3.x-cpu-ml-scala2.12",
    "hidden": true
  },
  "docker_image.url": {
    "type": "forbidden",
    "hidden": true
  }
}