Azure Databricks and security

Azure Databricks is a data analytics platform optimized for Azure cloud services. It offers three environments for developing data intensive applications:

To learn more about how Azure Databricks improves the security of big data analytics, reference Azure Databricks concepts.

The following sections include design considerations, a configuration checklist, and recommended configuration options specific to Azure Databricks.

Design considerations

All users' notebooks and notebook results are encrypted at rest, by default. If other requirements are in place, consider using customer-managed keys for notebooks.

Checklist

Have you configured Azure Databricks with security in mind?


  • Use Microsoft Entra ID credential passthrough to avoid the need for service principals when communicating with Azure Data Lake Storage.
  • Isolate your workspaces, compute, and data from public access. Make sure that only the right people have access and only through secure channels.
  • Ensure that the cloud workspaces for your analytics are only accessible by properly managed users.
  • Implement Azure Private Link.
  • Restrict and monitor your virtual machines.
  • Use Dynamic IP access lists to allow admins to access workspaces only from their corporate networks.
  • Use the VNet injection functionality to enable more secure scenarios.
  • Use diagnostic logs to audit workspace access and permissions.
  • Consider using the Secure cluster connectivity feature and hub/spoke architecture to prevent opening ports, and assigning public IP addresses on cluster nodes.

Configuration recommendations

Explore the following table of recommendations to optimize your Azure Databricks configuration for security:

Recommendation Description
Ensure that the cloud workspaces for your analytics are only accessible by properly managed users. Microsoft Entra ID can handle single sign-on for remote access. For extra security, reference Conditional Access.
Implement Azure Private Link. Ensure all traffic between users of your platform, the notebooks, and the compute clusters that process queries are encrypted and transmitted over the cloud provider's network backbone, inaccessible to the outside world.
Restrict and monitor your virtual machines. Clusters, which execute queries, should have SSH and network access restricted to prevent installation of arbitrary packages. Clusters should use only images that are periodically scanned for vulnerabilities.
Use the VNet injection functionality to enable more secure scenarios. Such as:
- Connecting to other Azure services using service endpoints.
- Connecting to on-premises data sources, taking advantage of user-defined routes.
- Connecting to a network virtual appliance to inspect all outbound traffic and take actions according to allow and deny rules.
- Using custom DNS.
- Deploying Azure Databricks clusters in existing virtual networks.
Use diagnostic logs to audit workspace access and permissions. Use audit logs to see privileged activity in a workspace, cluster resizing, files, and folders shared on the cluster.

Source artifacts

Azure Databricks source artifacts include the Databricks blog: Best practices to secure an enterprise-scale data platform.

Next step