Configure a firewall for serverless compute access

Note

If you configured storage firewalls using subnet IDs from Azure Databricks documentation before October 31, 2023, Databricks recommends you update the workspaces following the steps in this article or using a private endpoint. If you choose to not update existing workspaces, they continue to work without changes.

This article describes how to configure an Azure storage firewall for serverless compute using the Azure Databricks account console UI. You can also use the Network Connectivity Configurations API.

To configure a private endpoint for serverless compute access, see Configure private connectivity from serverless compute.

Note

There are currently no networking charges for serverless features. In a later release, you might be charged. Azure Databricks will provide advance notice for networking pricing changes.

Overview of firewall enablement for serverless compute

Serverless network connectivity is managed with network connectivity configurations (NCCs). Account admins create NCCs in the account console and an NCC can be attached to one or more workspaces

An NCC contains a list of network identities for an Azure resource type as default rules. When an NCC is attached to a workspace, serverless compute in that workspace uses one of those networks to connect the Azure resource. You can allowlist those networks on your Azure resource firewall. If you have non-storage Azure resource firewalls, please contact your account team for information on how to use Azure Databricks stable NAT IPs.

NCC firewall enablement is supported from serverless SQL warehouses, workflows, notebooks, Delta Live Tables pipelines, and model serving CPU endpoints.

You can optionally configure network access to your workspace storage account from only authorized networks, including serverless compute. See Enable firewall support for your workspace storage account. When an NCC is attached to a workspace, the network rules are automatically added to the Azure storage account for the workspace storage account.

For more information on NCCs, see What is a network connectivity configuration (NCC)?.

Cost implications of cross-region storage access

For cross-region traffic from Azure Databricks serverless compute (for example, workspace is in East US region and ADLS storage is in West Europe), Azure Databricks routes the traffic through an Azure NAT Gateway service.

Important

There are currently no charges to use this feature. In a later release, you might be charged for usage. To avoid cross-region charges, Databricks recommends you create a workspace in the same region as your storage.

Requirements

  • Your workspace must be on the Premium plan.

  • You must be an Azure Databricks account admin.

  • Each NCC can be attached to up to 50 workspaces.

  • Each Azure Databricks account can have up to 10 NCCs per region.

  • You must have WRITE access to your Azure storage account’s network rules.

Step 1: Create a network connectivity configuration and copy subnet IDs

Databricks recommends sharing NCCs among workspaces in the same business unit and those sharing the same region and connectivity properties. For example, if some workspaces use storage firewall and other workspaces use the alternative approach of Private Link, use separate NCCs for those use cases.

  1. As an account admin, go to the account console.
  2. In the sidebar, click Cloud Resources.
  3. Click Network Connectivity Configuration.
  4. Click Add Network Connectivity Configurations.
  5. Type a name for the NCC.
  6. Choose the region. This must match your workspace region.
  7. Click Add.
  8. In the list of NCCs, click on your new NCC.
  9. In Default Rules under Network identities, click View all.
  10. In the dialog, click the Copy subnets button.
  11. Click Close.

Step 2: Attach an NCC to workspaces

You can attach an NCC to up to 50 workspaces in the same region as the NCC.

To use the API to attach an NCC to a workspace, see the Account Workspaces API.

  1. In the account console sidebar, click Workspaces.
  2. Click your workspace’s name.
  3. Click Update workspace.
  4. In the Network Connectivity Config field, select your NCC. If it’s not visible, confirm that you’ve selected the same region for both the workspace and the NCC.
  5. Click Update.
  6. Wait 10 minutes for the change to take effect.
  7. Restart any running serverless compute resources in the workspace.

If you are using this feature to connect to the workspace storage account, your configuration is complete. The network rules are automatically added to the workspace storage account. For additional storage accounts, continue to the next step.

Step 3: Lock down your storage account

If you haven’t already limited access to the Azure storage account to only allow-listed networks, do so now. You do not need to do this step for the workspace storage account.

Creating a storage firewall also affects connectivity from classic compute plane to your resources. You must also add network rules to connect to your storage accounts from classic compute resources.

  1. Go to the Azure portal.
  2. Navigate to your storage account for the data source.
  3. In the left nav, click Networking.
  4. In the field Public network access, check the value. By default, the value is Enabled from all networks. Change this to Enabled from selected virtual networks and IP addresses.

Step 4: Add Azure storage account network rules

You do not need to do this step for the workspace storage account.

  1. Add one Azure storage account network rule for each subnet. You can do this using the Azure CLI, PowerShell, Terraform, or other automation tools. Note that this step cannot be done in the Azure Portal user interface.

    The following example uses the Azure CLI:

    az storage account network-rule add --subscription "<sub>" \
        --resource-group "<res>" --account-name "<account>" --subnet "<subnet>"
    
    • Replace <sub> with the name of your Azure subscription for the storage account.
    • Replace <res> with the resource group of your storage account.
    • Replace <account> with the name of your storage account
    • Replace <subnet> with the ARM resource ID (resourceId) of the serverless compute subnet.

    After running all the commands, you can use the Azure portal to view your storage account and confirm that there is an entry in the Virtual Networks table that represents the new subnet. However, you cannot make the network rules changes in the Azure portal.

    Tip

    • Ensure that you are using the latest information from the NCC to configure the correct set of network resources.
    • Avoid storing NCC information locally.
    • Ignore the mention of “Insufficient permissions” in the endpoint status column or the warning below the network list. They indicate only that you do not have permission to read the Azure Databricks subnets but it does not interfere with the ability for that Azure Databricks serverless subnet to contact your Azure storage.

    Example new entries in Virtual Networks list

  2. Repeat this command once for every subnet.

  3. To confirm that your storage account uses these settings from the Azure portal, navigate to Networking in your storage account.

    Confirm that the Public network access is set to Enabled from selected virtual networks and IP addresses and allowed networks are listed in the Virtual Networks section.