Data object privileges
This feature is in Public Preview.
The Azure Databricks data governance model lets you programmatically grant, deny, and revoke access to your data from Spark SQL. This model lets you control access to securable objects like catalogs, databases, tables, views, and functions. It also allows for fine-grained access control (to a particular subset of a table, for example) by setting privileges on derived views created from arbitrary queries. The Azure Databricks SQL query analyzer enforces these access control policies at runtime on Azure Databricks clusters with table access control enabled and all SQL endpoints.
This article describes the privileges, objects, and ownership rules that make up the Azure Databricks data governance model. It also describes how to grant, deny, and revoke object privileges.
The requirements for managing object privileges depends on your environment:
Databricks Data Science & Engineering and Databricks Machine Learning
- An administrator must enable and enforce table access control for the workspace.
- The cluster must be enabled for table access control.
Data governance model
This section describes the Azure Databricks data governance model. Access to securable data objects is governed by privileges.
The securable objects are:
CATALOG: controls access to the entire data catalog.
DATABASE: controls access to a database.
TABLE: controls access to a managed or external table.
VIEW: controls access to SQL views.
FUNCTION: controls access to a named function.
ANONYMOUS FUNCTION: controls access to anonymous or temporary functions.
ANONYMOUS FUNCTIONobjects are not supported in Databricks SQL.
ANY FILE: controls access to the underlying filesystem.
Users granted access to
ANY FILEcan bypass the restrictions put on the catalog, databases, tables, and views by reading from the filesystem directly.
SELECT: gives read access to an object.
CREATE: gives ability to create an object (for example, a table in a database).
MODIFY: gives ability to add, delete, and modify data to or from an object.
USAGE: does not give any abilities, but is an additional requirement to perform any action on a database object.
READ_METADATA: gives ability to view an object and its metadata.
CREATE_NAMED_FUNCTION: gives ability to create a named UDF in an existing catalog or database.
MODIFY_CLASSPATH: gives ability to add files to the Spark class path.
ALL PRIVILEGES: gives all privileges (is translated into all the above privileges).
MODIFY_CLASSPATH privileges are not supported in Databricks SQL.
To perform an action on a database object, a user must have the
USAGE privilege on that database in addition to the privilege to perform that action. Any one of the following satisfy the
- Be an admin
- Have the
USAGEprivilege on the database or be in a group that has the
USAGEprivilege on the database
- Have the
USAGEprivilege on the
CATALOGor be in a group that has the
- Be the owner of the database or be in a group that owns the database
Even the owner of an object inside a database must have the
USAGE privilege in order to use it.
As an example, an administrator could define a
finance group and an
accounting database for them to use.
To set up a database that only the finance team can use and share, an admin would do the following:
CREATE DATABASE accounting; GRANT USAGE ON DATABASE accounting TO finance; GRANT CREATE ON DATABASE accounting TO finance;
With these privileges, members of the
finance group can create tables and views in the
but can’t share those tables or views with any principal that does not have
USAGE on the
Databricks Data Science & Engineering and Databricks Runtime version behavior
- Clusters running Databricks Runtime 7.3 LTS and above enforce the
- Clusters running Databricks Runtime 7.2 and below do not enforce the
- To ensure that existing workloads function unchanged, in workspaces that used table access control before
USAGEwas introduced have had the
CATALOGgranted to the
usersgroup. If you want to take advantage of the
USAGEprivilege, you must run
REVOKE USAGE ON CATALOG FROM usersand then
GRANT USAGE ...as needed.
SQL objects in Azure Databricks are hierarchical and privileges are inherited. This means that granting
or denying a privilege on the
CATALOG automatically grants or denies the privilege to all databases
in the catalog. Similarly, privileges granted on a
DATABASE object are inherited by all objects in
When table access control is enabled on a cluster or SQL endpoint, a user who creates a database, table, view, or function becomes its owner. The owner is granted all privileges and can grant privileges to other users.
Groups may own objects, in which case all members of that group are considered owners.
Ownership determines whether or not you can grant privileges on derived objects to other users.
For example, suppose user A owns table T and grants user B
SELECT privilege on table T. Even
though user B can select from table T, user B cannot grant
SELECT privilege on table T to user C,
because user A is still the owner of the underlying table T. Furthermore, user B cannot circumvent
this restriction simply by creating a view V on table T and granting privileges on that view to
user C. When Azure Databricks checks for privileges for user C to access view V, it also checks that
the owner of V and underlying table T are the same. If the owners are not the same, user C must
SELECT privileges on underlying table T.
When table access control is disabled on a cluster, no owner is registered when a database, table, view,
or function is created. To test if an object has an owner, run
SHOW GRANT ON <object-name>.
If you do not see an entry with
ActionType OWN, the object does not have an owner.
Assign owner to object
Either the owner of an object or an administrator can transfer ownership of an object using the
ALTER <object> OWNER TO `<user-name>@<user-domain>.com` command:
ALTER DATABASE <database-name> OWNER TO `<user-name>@<user-domain>.com` ALTER TABLE <table-name> OWNER TO `group_name` ALTER VIEW <view-name> OWNER TO `<user-name>@<user-domain>.com`
Users and groups
Administrators and owners can grant privileges to users and groups.
Each user is uniquely identified by their username in Azure Databricks (which typically maps to their email address).
All users are implicitly a part of the “All Users” group, represented as
users in SQL.
You must enclose user specifications in backticks (
` `), not single quotes (
Operations and privileges
In Azure Databricks, admin users can manage all object privileges, effectively have all privileges granted on all securables, and can change the owner of any object. Owners of an object can perform any action on that object, can grant privileges on that object to other principals, and can transfer ownership of the object to another principal. The only limit to an owner’s privileges is for objects within a database; to interact with an object in a database the user must also have
USAGE on that database.
The following table maps SQL operations to the privileges required to perform that operation.
- Any place where a privilege on a table, view, or function is required,
USAGEis also required on the database it’s in.
- In any place where a table is referenced in a command, a path could also be referenced. In those instances
MODIFYis required on
ANY FILEinstead of
USAGEon the database and another privilege on the table.
- Object ownership is represented here as the
|CREATE BLOOMFILTER INDEX||
|DROP BLOOMFILTER INDEX||
|FSCK REPAIR TABLE||
When you use table access control,
DROP TABLE statements are case sensitive. If a table name is lower case and the
DROP TABLE references the table name using mixed or upper case, the
DROP TABLE statement will fail.
Manage object privileges
You use the
SHOW GRANT operations to manage object privileges.
An owner or an administrator of an object can perform
SHOW GRANToperations. However, an administrator cannot deny privileges to or revoke privileges from an owner.
A principal that’s not an owner or administrator can perform an operation only if the required privilege has been granted.
To grant, deny, or revoke a privilege for all users, specify the keyword
TO. For example,
GRANT SELECT ON ANY FILE TO users
GRANT SELECT ON DATABASE <database-name> TO `<user>@<domain-name>` GRANT SELECT ON ANONYMOUS FUNCTION TO `<user>@<domain-name>` GRANT SELECT ON ANY FILE TO `<user>@<domain-name>` SHOW GRANT `<user>@<domain-name>` ON DATABASE <database-name> DENY SELECT ON <table-name> TO `<user>@<domain-name>` REVOKE ALL PRIVILEGES ON DATABASE default FROM `<user>@<domain-name>` REVOKE SELECT ON <table-name> FROM `<user>@<domain-name>` GRANT SELECT ON ANY FILE TO users
Dynamic view functions
Azure Databricks includes two user functions that allow you to express column- and row-level permissions dynamically in the body of a view definition.
current_user(): return the current user name.
is_member(): determine if the current user is a member of a specific Azure Databricks group.
Available in Databricks Runtime 7.3 LTS and above. However, to use these functions in Databricks Runtime 7.3 LTS, you must set the Spark config
Consider the following example, which combines both functions to determine if a user has the appropriate group membership:
-- Return: true if the user is a member and false if they are not SELECT current_user as user, -- Check to see if the current user is a member of the "Managers" group. is_member("Managers") as admin
Allowing administrators to set fine granularity privileges for multiple users and groups within a single view is both expressive and powerful, while saving on administration overhead.
Through dynamic views it’s easy to limit what columns a specific group or user can see. Consider the following example where only users who belong to the
auditors group are able to see email addresses from the
sales_raw table. At analysis time Spark replaces the
CASE statement with either the literal
'REDACTED' or the column
-- Alias the field 'email' to itself (as 'email') to prevent the -- permission logic from showing up directly in the column name results. CREATE VIEW sales_redacted AS SELECT user_id, CASE WHEN is_member('auditors') THEN email ELSE 'REDACTED' END AS email, country, product, total FROM sales_raw
Using dynamic views you can specify permissions down to the row or field level. Consider the following example, where only users who belong to the
managers group are able to see transaction amounts (
total column) greater than $1,000,000.00:
CREATE VIEW sales_redacted AS SELECT user_id, country, product, total FROM sales_raw WHERE CASE WHEN is_member('managers') THEN TRUE ELSE total <= 1000000 END;
As shown in the preceding examples, you can implement column-level masking to prevent users from seeing specific column data unless they are in the correct group. Because these views are standard Spark SQL, you can do more advanced types of masking with more complex SQL expressions. The following example lets all users perform analysis on email domains, but lets members of the
auditors group see users’ full email addresses.
-- The regexp_extract function takes an email address such as -- firstname.lastname@example.org and extracts 'example', allowing -- analysts to query the domain name CREATE VIEW sales_redacted AS SELECT user_id, region, CASE WHEN is_member('auditors') THEN email ELSE regexp_extract(email, '^.*@(.*)$', 1) END FROM sales_raw
Frequently asked questions (FAQ)
How do I grant, deny, or revoke a privilege for all users?
Specify the keyword
FROM. For example:
GRANT SELECT ON TABLE database.table TO users
I created an object but now I can’t query, drop, or modify it.
This error can occur because you created that object on a cluster or SQL endpoint without table access control enabled. When table access control is disabled on a cluster or SQL endpoint, owners are not registered when a database, table, or view is created. An admin must assign an owner to the object using the following command:
ALTER [DATABASE | TABLE | VIEW] <object-name> OWNER TO `<user-name>@<user-domain>.com`;
How do I grant privileges on global and local temporary views?
Privileges on global and local temporary views are not supported. Local temporary
views are visible only within the same session, and views created in the
global_temp database are
visible to all users sharing a cluster or SQL endpoint. However, privileges on the underlying tables and views
referenced by any temporary views are enforced.
How do I grant a user or group privileges on multiple tables at once?
A grant, deny, or revoke statement can be applied to only one object at a time. The recommended way
to organize and grant privileges on multiple tables to a principal is via databases. Granting a
SELECT privilege on a database implicitly grants that principal
SELECT privileges on
all tables and views in that database. For example, if a database D has tables t1 and t2, and an
admin issues the following
GRANT USAGE, SELECT ON DATABASE D TO `<user>@<domain-name>`
<user>@<domain-name> can select from tables t1 and t2, as well as any tables and views created in database D in the future.
How do I grant a user privileges on all tables except one?
SELECT privilege to the database and then deny
SELECT privilege for the specific table you want to restrict access to.
GRANT USAGE, SELECT ON DATABASE D TO `<user>@<domain-name>` DENY SELECT ON TABLE D.T TO `<user>@<domain-name>`
<user>@<domain-name> can select from all tables in D except D.T.
A user has
SELECT privileges on a view of table T, but when that user tries to
SELECT from that view, they get the error
User does not have privilege SELECT on table.
This common error can occur for one of the following reasons:
- Table T has no registered owner because it was created using a cluster or SQL endpoint for which table access control is disabled.
- The grantor of the
SELECTprivilege on a view of table T is not the owner of table T or the user does not also have select
SELECTprivilege on table T.
Suppose there is a table T owned by A. A owns view V1 on T and B owns view V2 on T.
- A user can select on V1 when A has granted
SELECTprivileges on view V1.
- A user can select on V2 when A has granted
SELECTprivileges on table T and B has granted
SELECTprivileges on V2.
As described in the Object ownership section, these conditions ensure that only the owner of an object can grant other users access to that object.
I tried to run
sc.parallelize on a cluster with table access control enabled and it fails.
On clusters with table access control enabled you can use only the Spark SQL and Python DataFrame APIs. The RDD API is disallowed for security reasons, since Azure Databricks does not have the ability to inspect and authorize code within an RDD.