Access control requires the Azure Databricks Premium Plan.
Table access control lets you programmatically grant and revoke access to your data using the Azure Databricks view-based access control model.
This article describes how to enable and enforce table access control.
For information about how to set permissions on a data object once table access control is enabled, see Set Permissions on a Data Object.
Step 1: Enable table access control for a cluster
Table access control is available in two versions:
- SQL-only table access control, which:
- Is generally available.
- Restricts cluster users to SQL commands. Users are restricted to the Apache Spark SQL API, and therefore cannot use Python, Scala, R, RDD APIs, or clients that directly read the data from cloud storage, such as DBUtils.
- Requires that clusters run Databricks Runtime 3.1 or above.
- Python and SQL table access control (Beta), which:
- Is in Beta.
- Allows users to run SQL, Python, and PySpark commands. Users are restricted to the Spark SQL API and DataFrame API, and therefore cannot use Scala, R, RDD APIs, or clients that directly read the data from cloud storage, such as DBUtils.
- Requires that clusters run Databricks Runtime 3.5 or above.
SQL-only table access control
This version of table access control restricts users on the cluster to SQL commands only.
To enable SQL-only table access control on a cluster and restrict that cluster to use only SQL commands, set the following flag in the cluster’s Spark conf:
This version of table access control lets users run Python commands that use the DataFrame API as well as SQL. When it is enabled on a cluster, users on that cluster or pool:
- Can access Spark only via the Spark SQL API or DataFrame API. In both cases, access to tables and views is restricted by administrators according to the Azure Databricks View-based access control model.
- Cannot acquire direct access to data in the cloud via DBFS or by reading credentials from the cloud provider’s metadata service.
- Must run their commands on cluster nodes as a low-privilege user forbidden from accessing sensitive parts of the filesystem or creating network connections to ports other than 80 and 443.
Attempts to get around these restrictions will fail with an exception. These restrictions are in place so that your users can never access unprivileged data through the cluster.
There are two steps to enabling a cluster for Python and SQL table access control: enable table access control at the account level and create a cluster enabled for table access control.
Enable table access control at the account level
- Log in to the Admin Console.
- Go to the Access Control tab.
- Ensure that Cluster Access Control is enabled. You cannot enable table access control without having cluster access control already enabled.
- Next to Table Access Control, click the Enable button.
The checkbox is available only for high concurrency clusters.
When you create a cluster, click the Enable table access control and only allow Python and SQL commands option.
To create the cluster using the REST API, see Enable table access control example.
Step 2: Enforce table access control
To ensure that your users access only the data that you want them to, you must restrict your users to clusters with table access control enabled. In particular, you should ensure that:
Users do not have permission to create clusters. If they create a cluster without table access control, they can access any data from that cluster.
Users do not have Can Attach To permission for any cluster that is not enabled for table access control.
See Cluster Access Control for more information.
Now you can Set Permissions on a Data Object.