Databricks Terraform provider

HashiCorp Terraform is a popular open source tool for creating safe and predictable cloud infrastructure across several cloud providers. You can use the Databricks Terraform provider to manage your Azure Databricks workspaces and the associated cloud infrastructure using a flexible, powerful tool. The goal of the Databricks Terraform provider is to support all Databricks REST APIs, supporting automation of the most complicated aspects of deploying and managing your data platforms. Databricks customers are using the Databricks Terraform provider to deploy and manage clusters and jobs, provision Databricks workspaces, and configure data access.

Terraform resource relationship

Important

The Databricks Terraform provider is not formally supported by Databricks or Microsoft. It is maintained by Databricks field engineering teams and provided as is. There is no service level agreement (SLA). Databricks and Microsoft make no guarantees of any kind. If you discover an issue with the provider, file a GitHub Issue, and it will be reviewed by project maintainers as time permits.

Getting started

Complete the following steps to install and configure the command line tools that Terraform needs to operate. These tools include the Terraform CLI and the Azure CLI. After setting up these tools, complete the steps to create a base Terraform configuration that you can use later to manage your Azure Databricks workspaces and the associated Azure cloud infrastructure.

Note

This procedure assumes that you have access to a deployed Azure Databricks workspace as a Databricks admin, access to the corresponding Azure subscription, and the appropriate permissions for the actions you want Terraform to perform in that Azure subscription. For more information, see the following:

  1. Install the Terraform CLI. For details, see Download Terraform on the Terraform website.

  2. Install the Azure CLI, and then use the Azure CLI to login to Azure by running the az login command. For details, see Install the Azure CLI on the Microsoft Azure website and Azure Provider: Authenticating using the Azure CLI on the Terraform website.

    az login
    

    Tip

    To have Terraform run within the context of a different login, run the az login command again. You can switch to have Terraform use an Azure subscription other than the one listed as "isDefault": true in the output of running az login. To do this, run the command az account set --subscription="<subscription ID>", replacing <subscription ID> with the value of the id property of the desired subscription in the output of running az login.

    This procedure uses the Azure CLI, along with the default subscription, to authenticate. For alternative authentication options, see Authenticating to Azure on the Terraform website.

  3. In your terminal, create an empty directory and then switch to it. (Each separate set of Terraform configuration files must be in its own directory.) For example: mkdir terraform_demo && cd terraform_demo.

    mkdir terraform_demo && cd terraform_demo
    
  4. In this empty directory, create a file named main.tf. Add the following content to this file, and then save the file:

    terraform {
      required_providers {
        azurerm = {
          source = "hashicorp/azurerm"
          version = ">= 2.26"
        }
    
        databricks = {
          source = "databrickslabs/databricks"
          version = "0.3.2"
        }
      }
    }
    
    provider "azurerm" {
      features {}
    }
    
    provider "databricks" {}
    
  5. Initialize the working directory containing the main.tf file by running the terraform init command. For more information, see Command: init on the Terraform website.

    terraform init
    

    Terraform downloads the azurerm and databricks providers and installs them in a hidden subdirectory of your current working directory, named .terraform. The terraform init command prints out which version of the providers were installed. Terraform also creates a lock file named .terraform.lock.hcl which specifies the exact provider versions used, so that you can control when you want to update the providers used for your project.

  6. Apply the changes required to reach the desired state of the configuration by running the terraform apply command. For more information, see Command: apply on the Terraform website.

    terraform apply
    

    Because no resources have yet been specified in the main.tf file, the output is Apply complete! Resources: 0 added, 0 changed, 0 destroyed. Also, Terraform writes data into a file called terraform.tfstate. To create resources, continue with Sample configuration, Next steps, or both to specify the desired resources to create, and then run the terraform apply command again. Terraform stores the IDs and properties of the resources it manages in this terraform.tfstate file, so that it can update or destroy those resources going forward.

Sample configuration

Complete the following procedure to create a sample Terraform configuration that creates a notebook and a job to run that notebook, in an existing Azure Databricks workspace.

  1. In the main.tf file that you created in Getting started, change the databricks provider to reference an existing Azure Databricks workspace:

    provider "databricks" {
      host = var.databricks_workspace_url
    }
    
  2. At the end of the main.tf file, add the following code:

    variable "databricks_workspace_url" {
      description = "The URL to the Azure Databricks workspace (must start with https://)"
      type = string
      default = "<Azure Databricks workspace URL>"
    }
    
    variable "resource_prefix" {
      description = "The prefix to use when naming the notebook and job"
      type = string
      default = "terraform-demo"
    }
    
    variable "email_notifier" {
      description = "The email address to send job status to"
      type = list(string)
      default = ["<Your email address>"]
    }
    
    // Get information about the Databricks user that is calling
    // the Databricks API (the one associated with "databricks_connection_profile").
    data "databricks_current_user" "me" {}
    
    // Create a simple, sample notebook. Store it in a subfolder within
    // the Databricks current user's folder. The notebook contains the
    // following basic Spark code in Python.
    resource "databricks_notebook" "this" {
      path     = "${data.databricks_current_user.me.home}/Terraform/${var.resource_prefix}-notebook.ipynb"
      language = "PYTHON"
      content_base64 = base64encode(<<-EOT
        # created from ${abspath(path.module)}
        display(spark.range(10))
        EOT
      )
    }
    
    // Create a job to run the sample notebook. The job will create
    // a cluster to run on. The cluster will use the smallest available
    // node type and run the latest version of Spark.
    
    // Get the smallest available node type to use for the cluster. Choose
    // only from among available node types with local storage.
    data "databricks_node_type" "smallest" {
      local_disk = true
    }
    
    // Get the latest Spark version to use for the cluster.
    data "databricks_spark_version" "latest" {}
    
    // Create the job, emailing notifiers about job success or failure.
    resource "databricks_job" "this" {
      name = "${var.resource_prefix}-job-${data.databricks_current_user.me.alphanumeric}"
      new_cluster {
        num_workers   = 1
        spark_version = data.databricks_spark_version.latest.id
        node_type_id  = data.databricks_node_type.smallest.id
      }
      notebook_task {
        notebook_path = databricks_notebook.this.path
      }
      email_notifications {
        on_success = var.email_notifier
        on_failure = var.email_notifier
      }
    }
    
    // Print the URL to the notebook.
    output "notebook_url" {
      value = databricks_notebook.this.url
    }
    
    // Print the URL to the job.
    output "job_url" {
      value = databricks_job.this.url
    }
    
  3. Replace the following values, and then save the file:

    • Replace <Azure Databricks workspace URL> with the URL to the Azure Databricks workspace.
    • Replace <Your email address> with your email address.
  4. Run terraform apply.

  5. Verify that the notebook and job were created: in the output of the terraform apply command, find the URLs for notebook_url and job_url and go to them.

  6. Run the job: on the Jobs page, click Run Now. After the job finishes, check your email inbox.

  7. When you are done with this sample, delete the notebook and job from the Azure Databricks workspace by running terraform destroy.

  8. Verify that the notebook and job were deleted: refresh the notebook and Jobs pages to display a message that the reources cannot be found.

Next steps

  1. Create an Azure Databricks workspace.
  2. Manage workspace resources for an Azure Databricks workspace.

Troubleshooting

For Terraform-specific support, see the Latest Terraform topics on the HashiCorp Discuss website. For issues specific to the Databricks Terraform Provider, see Issues in the databrickslabs/terraform-provider-databricks GitHub repository.

Additional resources