Cluster Node Initialization Scripts

An init script is a shell script that runs during startup of each cluster node before the Spark driver or worker JVM starts.

Some examples of tasks performed by init scripts include:

  • Install packages and libraries not included in the Databricks runtime. To install Python packages, use the Azure Databricks pip binary located at /databricks/python/bin/pip to ensure that Python packages install into the Databricks Python virtual environment rather than the system Python environment. For example, /databricks/python/bin/pip install <packagename>.
  • Modify the JVM system classpath in special cases.
  • Set system properties and environment variables used by the JVM.
  • Modify Spark configuration parameters.

Init script types

Azure Databricks supports three kinds of init scripts: cluster-scoped, global, and cluster-named.

  • Cluster-scoped: run on every cluster configured with the script. This is the recommended way to run an init script. If a cluster-scoped init script returns a non-zero exit code, the cluster launch fails. You can troubleshoot cluster-scoped init scripts by configuring cluster log delivery and examining the init script log.
  • Global: run on every cluster. Use these carefully because they can cause unanticipated impacts.
  • Cluster-named: run on a cluster with the same name as the script. Cluster-named init scripts are best-effort (silently ignore failures), and attempt to continue the cluster launch process. This type of script is deprecated and not recommended. Cluster-scoped init scripts should be used instead and are a complete replacement.

Whenever you change any type of init script you must restart all clusters affected by the script.

Init script execution order

The order of execution of init scripts is:

  1. Global
  2. Cluster-named
  3. Cluster-scoped

Init script locations

You can put init scripts in a DBFS directory accessible by a cluster. Init scripts in DBFS must be stored in the DBFS root. Azure Databricks does not support storing init scripts in a DBFS directory created by mounting object storage.

Cluster-scoped init scripts

Cluster-scoped init scripts are init scripts defined in a cluster configuration. Cluster-scoped init scripts apply to both clusters you create and those created to run jobs. Since the scripts are part of the cluster configuration, cluster access control lets you control who can change the scripts.

You can configure cluster-scoped init scripts using the UI, the CLI, and by invoking the Clusters API. This section focuses on performing these tasks using the UI. For the other methods, see Databricks CLI and Clusters API.

You can add any number of scripts, and the scripts are executed sequentially in the order provided.

Environment variables

Cluster-scoped init scripts support the following environment variables:

  • DB_CLUSTER_ID: the ID of the cluster on which the script is running. See Clusters API.
  • DB_CONTAINER_IP: the private IP address of the container in which Spark runs. The init script is run inside this container. See SparkNode.
  • DB_IS_DRIVER: whether the script is running on a driver node.
  • DB_DRIVER_IP: the IP address of the driver node.
  • DB_INSTANCE_TYPE: the instance type of the host VM.
  • DB_PYTHON_VERSION: the version of Python used on the cluster. See Python version.
  • DB_IS_JOB_CLUSTER: whether the cluster was created to run a job. See Create a job.
  • SPARKPASSWORD: a path to a secret.

For example, if you want to run part of a script only on a driver node, you could write a script like:

if [[ $DB_IS_DRIVER = "TRUE" ]]; then
  <run this part only on driver>
  <run this part only on workers>
<run this part on both driver and workers>

Cluster-scoped init script events

Init scripts report start and finish events in the cluster event log. Therefore, you can compute the time it takes for an init script to run by subtracting the start from the finish event timestamps.

Cluster-scoped init script logs

By default, init script logs are stored in /databricks/init_scripts.

If cluster log delivery is configured, logs are delivered to that location. For each container, they will appear in a subdirectory called init_scripts/<cluster_id>_<container_ip>. For example, if cluster logs are delivered to dbfs:/cluster-logs, the directory would be: dbfs:/cluster-logs/init_scripts/<cluster_id>_<container_ip>. For example:

dbfs ls dbfs:/cluster-logs/1001-234039-abcde739/init_scripts

If the logs are delivered to DBFS you can view the logs using File system utilities. Otherwise, you can use the following code in a notebook to view the logs:

ls /databricks/init_scripts/

Every time a cluster launches, it writes a log to the init script log folder.

Example cluster-scoped init script

This example creates an init script that installs a PostgreSQL JDBC driver on a cluster with ID 1202-211320-brick1.

  1. Create the base directory you want to store the init script in if it does not exist. Here we use dbfs:/databricks/scripts as an example.

  2. Create the script.

    wget --quiet -O /mnt/driver-daemon/jars/postgresql-42.2.2.jar
    wget --quiet -O /mnt/jars/driver-daemon/postgresql-42.2.2.jar""", True)
  3. Check that the script exists.


Configure a cluster-scoped init script

Configure the cluster to run the script you created in Example cluster-scoped init script using the UI or API.

Configure a cluster-scoped init script using the UI

You can use the cluster configuration page to add the init script to the cluster:

  1. On the cluster configuration page, click the Advanced Options toggle.

  2. At the bottom of the page, click the Init Scripts tab.

    Init Scripts tab

  3. In the Destination drop-down, select a destination type.

  4. Specify a path to the init script.

  5. Click Add.

  6. Upload your script to the specified location.

If the script pointed to by the configuration doesn’t exist, the cluster will fail to be created or autoscaled up.

To remove a script from the cluster configuration, click the Delete Icon at the right of the script. When you confirm the delete you will be prompted to restart the cluster. Optionally you can delete the script file from the location you uploaded it to.

Configure a cluster-scoped init script using the API

curl -n -X POST -H 'Content-Type: application/json' -d '{
  "cluster_id": "1202-211320-brick1",
  "num_workers": 1,
  "spark_version": "2.4.x-scala2.11",
  "node_type_id": "Standard_D3_v2",
  "cluster_log_conf": {
    "dbfs" : {
      "destination": "dbfs:/cluster-logs"
  "init_scripts": [ {
    "dbfs": {
      "destination": "dbfs:/databricks/scripts/"
  } ]
}' https://<databricks-instance>/api/2.0/clusters/edit

Global init scripts

A global init script runs on every cluster created in your workspace. Global init scripts are useful when you want to enforce organization-wide library configurations or security screens. A global init script must be stored in dbfs:/databricks/init/.


  • Use global init scripts carefully. It is easy to add libraries or make other modifications that cause unanticipated impacts. Whenever possible, use cluster-scoped init scripts instead.
  • If there is more than one global init script, the order of execution is undetermined and depends on the order that the DBFS client returns the scripts.

To delete a global init script, delete the init script file. You can perform this in a notebook, using the DBFS API, or using the DBFS CLI. For example:


If you have created a global init script that is preventing new clusters from starting up, use the API or CLI to move or delete the script.

Example global init script

  1. Create dbfs:/databricks/init/ if it doesn’t exist.

  2. Display the list of existing global init scripts.

  3. Create a script that simply appends to a file.

    dbutils.fs.put("dbfs:/databricks/init/" ,"""
    echo "hello" >> /hello.txt
    """, True)
  4. Check that the script exists.


Cluster-named init scripts (deprecated)

Cluster-named scripts scope to a single cluster, specified by the cluster’s name. Cluster-named init scripts must be stored in the directory dbfs:/databricks/init/<cluster-name>. For example, to specify init scripts for the cluster named PostgreSQL, create the directory dbfs:/databricks/init/PostgreSQL, and put all scripts that should run on cluster PostgreSQL in that directory.


  • Cluster-named init scripts are deprecated. Azure Databricks recommends that you use cluster-scoped init scripts.
  • You cannot use cluster-named init scripts for clusters that run jobs because automated cluster names are generated on the fly. However, you can use cluster-scoped init scripts for automated clusters.
  • Avoid spaces in cluster names since they’re used in the script and output paths.
  • If there is more than one cluster-named init script, the order of execution is undetermined and depends on the order that the DBFS client returns the scripts.

To delete a cluster-named init script, delete the init script file. You can perform this in a notebook, or using the DBFS API, or using the DBFS CLI. For example:


Example cluster-named init script

This example creates an init script for a cluster named PostgreSQL that installs the PostgreSQL JDBC driver on that cluster. You can create a customizable command if you create a variable clusterName that holds the cluster name.

  1. Create dbfs:/databricks/init/ if it doesn’t exist.

  2. Display the list of existing global init scripts.

  3. Configure a cluster name variable.

    clusterName = "PostgreSQL"
  4. Create a directory named PostgreSQL using Databricks File System (DBFS).

  5. Create the script.

    wget --quiet -O /mnt/driver-daemon/jars/postgresql-42.2.2.jar
    wget --quiet -O /mnt/jars/driver-daemon/postgresql-42.2.2.jar""", True)
  6. Check that the cluster-specific init script exists.


Global and cluster-named init script logs

Databricks saves all init script output for global and cluster-named init scripts to a file in DBFS named as follows: dbfs:/databricks/init/output/<cluster-name>/<date-timestamp>/<script-name>_<node-ip>.log. For example, if a cluster PostgreSQL has two Spark nodes with IPs and, and the init script directory has a script called, there will be two output files at the following paths:

  • dbfs:/databricks/init/output/PostgreSQL/2016-01-01_12-00-00/installpostgres.sh_10.0.0.1.log
  • dbfs:/databricks/init/output/PostgreSQL/2016-01-01_12-00-00/installpostgres.sh_10.0.0.2.log