Enable firewall support for your workspace storage account

When you create a new Azure Databricks workspace, an Azure storage account is created in a managed resource group, known as the workspace storage account. The workspace storage account includes workspace system data (job output, system settings, and logs), DBFS root, and in some cases a Unity Catalog workspace catalog. This article describes how to limit access to your workspace storage account from only authorized resources and networks using an ARM template.

What is firewall support for your workspace storage account?

By default, the Azure storage account for your workspace storage account accepts authenticated connections from all networks. You can limit this access by enabling firewall support for your workspace storage account. This ensures that public network access is disallowed and the workspace storage account is not accessible from unauthorized networks. You might want to configure this if your organization has Azure policies that ensure storage accounts are private.

When firewall support for your workspace storage account is enabled all access from services outside Azure Databricks must use approved private endpoints with Private Link. Azure Databricks creates a access connector to connect to the storage using an Azure managed identity. Access from Azure Databricks serverless SQL warehouses must either use either service endpoints or private endpoints.

Note

Model Serving is not supported with firewall support for the workspace storage account.

Requirements

  • Your workspace must enable VNet injection.

    If you are creating a new workspace, create a virtual network and two subnets following the instructions at Virtual network requirements.

  • Your workspace must enable secure cluster connectivity (No Public IP / NPIP).

  • Your workspace must be on the Premium plan.

  • You must have a separate subnet for the private endpoints for the storage account. This is in addition to the main two subnets for basic Azure Databricks functionality.

    The subnet must be in the same VNet as the workspace or in a separate VNet that the workspace can access. Use the minimum size /28 in CIDR notation.

  • If you are using Cloud Fetch with the Microsoft Fabric Power BI service, you must always use a gateway for private access to the workspace storage account or disable Cloud Fetch. See Step 3 (Recommended): Configure private endpoints for Cloud Fetch client VNets.

Step 1: Deploy the required ARM template

  1. If you are using an existing workspace, shut down any compute resources in your workspace.
  2. In Azure portal, search for and select Deploy a custom template.
  3. Click Build your own template in the editor.
  4. Copy the ARM template from ARM template for firewall support for your workspace storage account and paste it in the editor.
  5. Click Save.
  6. Review and edit fields. For a description for the fields, see ARM template fields.
  7. Click Review and Create, then Create.

Your workspace is temporarily unable to run notebooks or jobs until you create your private endpoints.

Step 2: Create private endpoints to the storage account

Create two private endpoints to your workspace storage account from your VNet that you used for VNet injection for the Target sub-resource values: dfs and blob.

  1. In the Azure portal, navigate to your workspace.

  2. Under Essentials, click the name of the Managed Resource Group.

  3. Under Resources, click the resource of type Storage account that has a name that begins with dbstorage.

  4. In the sidebar, click Networking.

  5. Click Private endpoint connections.

  6. Click + Private endpoint.

  7. In the Resource Group name field, set your resource group. This must not be the same as the managed resource group that your workspace storage account is in.

  8. In the Name field, type a unique name for this private endpoint:

    • For the first private endpoint you create for each source network, create a DFS endpoint. Databricks recommends you add the suffix -dfs-pe
    • For the second private endpoint you create for each source network, create a Blob endpoint. Databricks recommends you add the suffix -blob-pe

    The Network Interface Name field auto-populates.

  9. Set the Region field to the region of your workspace.

  10. Click Next.

  11. In Target sub-resource, click the target resource type.

    • For the first private endpoint you create for each source network, set this to dfs.
    • For the second private endpoint you create for each source network, set this to blob.
  12. In the Virtual network field, select a VNet.

  13. In the subnet field, set the subnet to the separate subnet you have for the private endpoints for the storage account.

    This field might auto-populate with the subnet for your private endpoints, but you may have to set it explicitly. You cannot use one of the two workspace subnets that are used for basic Azure Databricks workspace functionality, which are typically called private-subnet and public-subnet.

  14. Click Next. The DNS tab auto-populates to the right subscription and resource group that you previously selected. Change them if needed.

  15. Click Next and add tags if desired.

  16. Click Next and review the fields.

  17. Click Create.

To disable firewall support for your workspace storage account use the same process as above, but set the parameter Storage Account Firewall (storageAccountFirewall in the template) to Disabled and set the Workspace Catalog Enabled field to true or false based on whether your workspace uses a Unity Catalog workspace catalog. See Catalogs.

Cloud Fetch is a mechanism in ODBC and JDBC for fetching data in parallel through cloud storage to bring the data faster to BI tools. If you are fetching query results larger than 1 MB from BI tools, you are likely using Cloud Fetch.

Note

If you are using the Microsoft Fabric Power BI service with Azure Databricks, you must disable Cloud Fetch as this feature blocks direct access to the workspace storage account from Fabric Power BI. Alternatively, you can configure a virtual network data gateway or on-premises data gateway to allow private access to the workspace storage account. This does not apply to Power BI desktop. To disable Cloud Fetch, use the configuration EnableQueryResultDownload=0.

If you use Cloud Fetch, create private endpoints to the workspace storage account from any VNets of your Cloud Fetch clients.

For each source network for Cloud Fetch clients, create two private endpoints that use two different Target sub-resource values: dfs and blob. Refer to Step 2: Create private endpoints to the storage account for detailed steps. In those steps, for the Virtual network field when creating the private endpoint, be sure that you specify your source VNet for each Cloud Fetch client.

Step 4: Confirm endpoint approvals

After you create all your private endpoints to the storage account, check if they are approved. They might auto-approve or you might need to approve them on the storage account.

  1. Navigate to your workspace in the Azure portal.
  2. Under Essentials, click the name of the Managed Resource Group.
  3. Under Resources, click the resource of type Storage account that has a name that begins with dbstorage.
  4. In the sidebar, click Networking.
  5. Click Private endpoint connections.
  6. Check the Connection state to confirm they say Approved or select them and click Approve.

Step 5: Authorize serverless SQL warehouse connections

You must authorize serverless SQL warehouses to connect to your workspace storage account by attaching a network connectivity configuration (NCC) to your workspace. When an NCC is attached to a workspace, the network rules are automatically added to the Azure storage account for the workspace storage account. For instructions, see Serverless compute plane networking.

If you want to enable access from Azure Databricks serverless SQL warehouses using private endpoints, contact your Azure Databricks account team.