Secure cluster connectivity (No Public IP / NPIP)

With secure cluster connectivity enabled, customer virtual networks have no open ports and Databricks Runtime cluster nodes have no public IP addresses. Secure cluster connectivity is also known as No Public IP (NPIP).

  • At a network level, each cluster initiates a connection to the control plane secure cluster connectivity relay (proxy) during cluster creation. The cluster establishes this connection using port 443 (HTTPS) and a different IP address than is used for the Web application and REST API.
  • Actions that the control plane logically initiates, such as starting new Databricks Runtime jobs or performing cluster administration, are sent as requests to the cluster through this reverse tunnel.
  • The data plane (the VNet) does not have open ports and Databricks Runtime cluster nodes do not have public IP addresses.

Note

Independent of whether secure cluster connectivity is enabled, all Azure Databricks network traffic between the data plane VNet and the Azure Databricks control plane goes across the Microsoft network backbone not the public Internet.

Benefits:

  • Easy network administration — Less complexity because there is no need for port configuration on security groups or configuring network peering.
  • Easier approval — Because of better security and simpler network administration, it is easier for information security teams to approve Databricks as a PaaS provider.

Secure cluster connectivity

Using secure cluster connectivity

To use secure cluster connectivity with an Azure Databricks workspace, use any of the following options:

  • Azure Portal: When you provision the workspace, go the Networking tab and set the option Deploy Azure Databricks workspace with Secure Cluster Connectivity (No Public IP) to Yes.
  • ARM Templates: For the Microsoft.Databricks/workspaces resource that creates your new workspace, set the enableNoPublicIp Boolean parameter to true.

Important

In both cases, you must register the Azure Resource Provider Microsoft.ManagedIdentity in the Azure subscription that you are going to use to launch a workspace with secure cluster connectivity. This is a one-time operation per subscription. For instructions, see Azure resource providers and types.

Note

Secure cluster connectivity is available only for new workspaces. If you have workspaces with public IPs that you would like to migrate, you should create new workspaces enabled for secure cluster connectivity and migrate your resources to the new workspaces. Contact your Microsoft or Databricks account team for details.

If you’re using ARM templates, add the parameter to one of the following templates, based on whether you want Azure Databricks to create a default (managed) virtual network for the workspace, or if you want to use your own virtual network, also known as VNet injection. VNet injection is an optional feature that allows you to provide your own VNet to host new Azure Databricks clusters.

Egress from workspace subnets

When you enable secure cluster connectivity, both of your workspace subnets are private subnets, since cluster nodes do not have public IP addresses.

The implementation details of network egress vary based on whether you use the default (managed) VNet or whether you use the optional VNet injection feature to provide your own VNet in which to deploy your workspace. See the following sections for details.

Important

There could be additional costs associated with managing egress traffic to support secure cluster connectivity. For a smaller organization that needs a cost-optimized solution, you can choose to deploy your workspace with secure cluster connectivity disabled. However, for the most secure deployment, Microsoft and Databricks strongly recommend enabling secure cluster connectivity for new workspaces.

Egress with default (managed) VNet

If you use the secure cluster connectivity (No Public IP / NPIP) feature with the default VNet that Azure Databricks creates, Azure Databricks automatically creates a NAT gateway for outbound traffic from your workspace’s subnets to the Azure backbone and public network. The NAT gateway is created within the managed resource group that Azure Databricks creates and manages. You cannot modify this resource group nor any resources provisioned in it.

There is an additional cost associated with the automatically-created NAT gateway.

Egress with VNet injection

If you use the secure cluster connectivity (No Public IP / NPIP) feature with optional VNet injection to provide your own VNet, you have several options for controlling egress.

Choose one of the following solutions to control egress traffic and ensure that your workspace has a stable egress public IP:

  • Egress load balancer: The recommended solution for simpler deployments is an egress load balancer, also called an outbound load balancer, whose configuration is managed by Azure Databricks. It provides a stable egress public IP for your workspace clusters, but you cannot modify the configuration for custom egress needs. This is an Azure template-only solution with the following requirements:
    • Azure Databricks expects additional fields to the ARM template that creates the workspace: loadBalancerName (load balancer name), loadBalancerBackendPoolName (load balancer backend pool name), loadBalancerFrontendConfigName (load balancer frontend configuration name) and loadBalancerPublicIpName (load balancer public IP name).
    • Azure Databricks expects the Microsoft.Databricks/workspaces resource to have parameters loadBalancerId (load balancer ID) and loadBalancerBackendPoolName (load balancer backend pool name).
    • Azure Databricks does not support changing the configuration of the load balancer.
  • NAT gateway: Configure an Azure NAT gateway. Configure the gateway on both of the workspace’s subnets to ensure that all outbound traffic to the Azure backbone and public network transits through it. This also provides a stable egress public IP for your workspace’s clusters, but you can modify the configuration for custom egress needs as supported from an Azure Networking perspective. You can implement this solution using either an Azure template or from the Azure portal.
  • Egress firewall or custom appliance: If you use VNet injection with an egress firewall like Azure Firewall or other custom networking architectures, you can use custom routes, which are also known as user-defined routes (UDRs). UDRs ensure that network traffic is routed correctly for your workspace, either directly to the required endpoints or through an egress firewall. If you use such a solution, you must add direct routes or allowed firewall rules for the Azure Databricks secure cluster connectivity relay and other required endpoints listed at User-defined route settings for Azure Databricks.