Access control lists (ACLs) in Azure Data Lake Storage Gen2

Azure Data Lake Storage Gen2 implements an access control model that supports both Azure role-based access control (Azure RBAC) and POSIX-like access control lists (ACLs). This article describes access control lists in Data Lake Storage Gen2. To learn about how to incorporate Azure RBAC together with ACLs, and how system evaluates them to make authorization decisions, see Access control model in Azure Data Lake Storage Gen2.

About ACLs

You can associate a security principal with an access level for files and directories. Each association is captured as an entry in an access control list (ACL). Each file and directory in your storage account has an access control list. When a security principal attempts an operation on a file or directory, An ACL check determines whether that security principal (user, group, service principal, or managed identity) has the correct permission level to perform the operation.

Note

ACLs apply only to security principals in the same tenant, and they don't apply to users who use Shared Key or shared access signature (SAS) token authentication. That's because no identity is associated with the caller and therefore security principal permission-based authorization cannot be performed.

How to set ACLs

To set file and directory level permissions, see any of the following articles:

Environment Article
Azure Storage Explorer Use Azure Storage Explorer to manage ACLs in Azure Data Lake Storage Gen2
Azure portal Use the Azure portal to manage ACLs in Azure Data Lake Storage Gen2
.NET Use .NET to manage ACLs in Azure Data Lake Storage Gen2
Java Use Java to manage ACLs in Azure Data Lake Storage Gen2
Python Use Python to manage ACLs in Azure Data Lake Storage Gen2
JavaScript (Node.js) Use the JavaScript SDK in Node.js to manage ACLs in Azure Data Lake Storage Gen2
PowerShell Use PowerShell to manage ACLs in Azure Data Lake Storage Gen2
Azure CLI Use Azure CLI to manage ACLs in Azure Data Lake Storage Gen2
REST API Path - Update

Important

If the security principal is a service principal, it's important to use the object ID of the service principal and not the object ID of the related app registration. To get the object ID of the service principal open the Azure CLI, and then use this command: az ad sp show --id <Your App ID> --query objectId. make sure to replace the <Your App ID> placeholder with the App ID of your app registration.

Types of ACLs

There are two kinds of access control lists: access ACLs and default ACLs.

Access ACLs control access to an object. Files and directories both have access ACLs.

Default ACLs are templates of ACLs associated with a directory that determine the access ACLs for any child items that are created under that directory. Files do not have default ACLs.

Both access ACLs and default ACLs have the same structure.

Note

Changing the default ACL on a parent does not affect the access ACL or default ACL of child items that already exist.

Levels of permission

The permissions on directories and files in a container, are Read, Write, and Execute, and they can be used on files and directories as shown in the following table:

File Directory
Read (R) Can read the contents of a file Requires Read and Execute to list the contents of the directory
Write (W) Can write or append to a file Requires Write and Execute to create child items in a directory
Execute (X) Does not mean anything in the context of Data Lake Storage Gen2 Required to traverse the child items of a directory

Note

If you are granting permissions by using only ACLs (no Azure RBAC), then to grant a security principal read or write access to a file, you'll need to give the security principal Execute permissions to the root folder of the container, and to each folder in the hierarchy of folders that lead to the file.

Short forms for permissions

RWX is used to indicate Read + Write + Execute. A more condensed numeric form exists in which Read=4, Write=2, and Execute=1, the sum of which represents the permissions. Following are some examples.

Numeric form Short form What it means
7 RWX Read + Write + Execute
5 R-X Read + Execute
4 R-- Read
0 --- No permissions

Permissions inheritance

In the POSIX-style model that's used by Data Lake Storage Gen2, permissions for an item are stored on the item itself. In other words, permissions for an item cannot be inherited from the parent items if the permissions are set after the child item has already been created. Permissions are only inherited if default permissions have been set on the parent items before the child items have been created.

The following table shows you the ACL entries required to enable a security principal to perform the operations listed in the Operation column.

This table shows a column that represents each level of a fictitious directory hierarchy. There's a column for the root directory of the container (/), a subdirectory named Oregon, a subdirectory of the Oregon directory named Portland, and a text file in the Portland directory named Data.txt.

Important

This table assumes that you are using only ACLs without any Azure role assignments. To see a similar table that combines Azure RBAC together with ACLs, see Permissions table: Combining Azure RBAC and ACL.

Operation / Oregon/ Portland/ Data.txt
Read Data.txt --X --X --X R--
Append to Data.txt --X --X --X RW-
Delete Data.txt --X --X -WX ---
Create Data.txt --X --X -WX ---
List / R-X --- --- ---
List /Oregon/ --X R-X --- ---
List /Oregon/Portland/ --X --X R-X ---

Note

Write permissions on the file are not required to delete it, so long as the previous two conditions are true.

Users and identities

Every file and directory has distinct permissions for these identities:

  • The owning user
  • The owning group
  • Named users
  • Named groups
  • Named service principals
  • Named managed identities
  • All other users

The identities of users and groups are Azure Active Directory (Azure AD) identities. So unless otherwise noted, a user, in the context of Data Lake Storage Gen2, can refer to an Azure AD user, service principal, managed identity, or security group.

The owning user

The user who created the item is automatically the owning user of the item. An owning user can:

  • Change the permissions of a file that is owned.
  • Change the owning group of a file that is owned, as long as the owning user is also a member of the target group.

Note

The owning user cannot change the owning user of a file or directory. Only super-users can change the owning user of a file or directory.

The owning group

In the POSIX ACLs, every user is associated with a primary group. For example, user "Alice" might belong to the "finance" group. Alice might also belong to multiple groups, but one group is always designated as their primary group. In POSIX, when Alice creates a file, the owning group of that file is set to her primary group, which in this case is "finance." The owning group otherwise behaves similarly to assigned permissions for other users/groups.

Assigning the owning group for a new file or directory

  • Case 1: The root directory /. This directory is created when a Data Lake Storage Gen2 container is created. In this case, the owning group is set to the user who created the container if it was done using OAuth. If the container is created using Shared Key, an Account SAS, or a Service SAS, then the owner and owning group are set to $superuser.
  • Case 2 (every other case): When a new item is created, the owning group is copied from the parent directory.

Changing the owning group

The owning group can be changed by:

  • Any super-users.
  • The owning user, if the owning user is also a member of the target group.

Note

The owning group cannot change the ACLs of a file or directory. While the owning group is set to the user who created the account in the case of the root directory, Case 1 above, a single user account isn't valid for providing permissions via the owning group. You can assign this permission to a valid user group if applicable.

How permissions are evaluated

Identities are evaluated in the following order:

  1. Superuser
  2. Owning user
  3. Named user, service principal or managed identity
  4. Owning group or named group
  5. All other users

If more than one of these identities applies to a security principal, then the permission level associated with the first identity is granted. For example, if a security principal is both the owning user and a named user, then the permission level associated with the owning user applies.

The following pseudocode represents the access check algorithm for storage accounts. This algorithm shows the order in which identities are evaluated.

def access_check( user, desired_perms, path ) :
  # access_check returns true if user has the desired permissions on the path, false otherwise
  # user is the identity that wants to perform an operation on path
  # desired_perms is a simple integer with values from 0 to 7 ( R=4, W=2, X=1). User desires these permissions
  # path is the file or directory
  # Note: the "sticky bit" isn't illustrated in this algorithm

  # Handle super users.
  if (is_superuser(user)) :
    return True

  # Handle the owning user. Note that mask isn't used.
  entry = get_acl_entry( path, OWNER )
  if (user == entry.identity)
      return ( (desired_perms & entry.permissions) == desired_perms )

  # Handle the named users. Note that mask IS used.
  entries = get_acl_entries( path, NAMED_USER )
  for entry in entries:
      if (user == entry.identity ) :
          mask = get_mask( path )
          return ( (desired_perms & entry.permissions & mask) == desired_perms)

  # Handle named groups and owning group
  member_count = 0
  perms = 0
  entries = get_acl_entries( path, NAMED_GROUP | OWNING_GROUP )
  mask = get_mask( path )
  for entry in entries:
    if (user_is_member_of_group(user, entry.identity)) :
        if ((desired_perms & entry.permissions & mask) == desired_perms)
            return True

  # Handle other
  perms = get_perms_for_other(path)
  mask = get_mask( path )
  return ( (desired_perms & perms & mask ) == desired_perms)

The mask

As illustrated in the Access Check Algorithm, the mask limits access for named users, the owning group, and named groups.

For a new Data Lake Storage Gen2 container, the mask for the access ACL of the root directory ("/") defaults to 750 for directories and 640 for files. The following table shows the symbolic notation of these permission levels.

Entity Directories Files
Owning user rwx rw-
Owning group r-x r--
Other --- ---

Files do not receive the X bit as it is irrelevant to files in a store-only system.

The mask may be specified on a per-call basis. This allows different consuming systems, such as clusters, to have different effective masks for their file operations. If a mask is specified on a given request, it completely overrides the default mask.

The sticky bit

The sticky bit is a more advanced feature of a POSIX container. In the context of Data Lake Storage Gen2, it is unlikely that the sticky bit will be needed. In summary, if the sticky bit is enabled on a directory, a child item can only be deleted or renamed by the child item's owning user.

The sticky bit isn't shown in the Azure portal.

Default permissions on new files and directories

When a new file or directory is created under an existing directory, the default ACL on the parent directory determines:

  • A child directory's default ACL and access ACL.
  • A child file's access ACL (files do not have a default ACL).

umask

When creating a file or directory, umask is used to modify how the default ACLs are set on the child item. umask is a 9-bit value on parent directories that contains an RWX value for owning user, owning group, and other.

The umask for Azure Data Lake Storage Gen2 a constant value that is set to 007. This value translates to:

umask component Numeric form Short form Meaning
umask.owning_user 0 --- For owning user, copy the parent's default ACL to the child's access ACL
umask.owning_group 0 --- For owning group, copy the parent's default ACL to the child's access ACL
umask.other 7 RWX For other, remove all permissions on the child's access ACL

The umask value used by Azure Data Lake Storage Gen2 effectively means that the value for other is never transmitted by default on new children, unless a default ACL is defined on the parent directory. In that case, the umask is effectively ignored and the permissions defined by the default ACL are applied to the child item.

The following pseudocode shows how the umask is applied when creating the ACLs for a child item.

def set_default_acls_for_new_child(parent, child):
    child.acls = []
    for entry in parent.acls :
        new_entry = None
        if (entry.type == OWNING_USER) :
            new_entry = entry.clone(perms = entry.perms & (~umask.owning_user))
        elif (entry.type == OWNING_GROUP) :
            new_entry = entry.clone(perms = entry.perms & (~umask.owning_group))
        elif (entry.type == OTHER) :
            new_entry = entry.clone(perms = entry.perms & (~umask.other))
        else :
            new_entry = entry.clone(perms = entry.perms )
        child_acls.add( new_entry )

FAQ

Do I have to enable support for ACLs?

No. Access control via ACLs is enabled for a storage account as long as the Hierarchical Namespace (HNS) feature is turned ON.

If HNS is turned OFF, the Azure Azure RBAC authorization rules still apply.

What is the best way to apply ACLs?

Always use Azure AD security groups as the assigned principal in an ACL entry. Resist the opportunity to directly assign individual users or service principals. Using this structure will allow you to add and remove users or service principals without the need to reapply ACLs to an entire directory structure. Instead, you can just add or remove users and service principals from the appropriate Azure AD security group.

There are many different ways to set up groups. For example, imagine that you have a directory named /LogData which holds log data that is generated by your server. Azure Data Factory (ADF) ingests data into that folder. Specific users from the service engineering team will upload logs and manage other users of this folder, and various Databricks clusters will analyze logs from that folder.

To enable these activities, you could create a LogsWriter group and a LogsReader group. Then, you could assign permissions as follows:

  • Add the LogsWriter group to the ACL of the /LogData directory with rwx permissions.
  • Add the LogsReader group to the ACL of the /LogData directory with r-x permissions.
  • Add the service principal object or Managed Service Identity (MSI) for ADF to the LogsWriters group.
  • Add users in the service engineering team to the LogsWriter group.
  • Add the service principal object or MSI for Databricks to the LogsReader group.

If a user in the service engineering team leaves the company, you could just remove them from the LogsWriter group. If you did not add that user to a group, but instead, you added a dedicated ACL entry for that user, you would have to remove that ACL entry from the /LogData directory. You would also have to remove the entry from all subdirectories and files in the entire directory hierarchy of the /LogData directory.

To create a group and add members, see Create a basic group and add members using Azure Active Directory.

How are Azure RBAC and ACL permissions evaluated?

To learn how the system evaluates Azure RBAC and ACLs together to make authorization decisions for storage account resources, see How permissions are evaluated.

What are the limits for Azure role assignments and ACL entries?

The following table provides a summary view of the limits to consider while using Azure RBAC to manage "coarse-grained" permissions (permissions that apply to storage accounts or containers) and using ACLs to manage "fine-grained" permissions (permissions that apply to files and directories). Use security groups for ACL assignments. By using groups, you're less likely to exceed the maximum number of role assignments per subscription and the maximum number of ACL entries per file or directory.

Mechanism Scope Limits Supported level of permission
Azure RBAC Storage accounts, containers.
Cross resource Azure role assignments at subscription or resource group level.
2000 Azure role assignments in a subscription Azure roles (built-in or custom)
ACL Directory, file 32 ACL entries (effectively 28 ACL entries) per file and per directory. Access and default ACLs each have their own 32 ACL entry limit. ACL permission

Does Data Lake Storage Gen2 support inheritance of Azure RBAC?

Azure role assignments do inherit. Assignments flow from subscription, resource group, and storage account resources down to the container resource.

Does Data Lake Storage Gen2 support inheritance of ACLs?

Default ACLs can be used to set ACLs for new child subdirectories and files created under the parent directory. To update ACLs for existing child items, you will need to add, update, or remove ACLs recursively for the desired directory hierarchy. For guidance, see the How to set ACLs section of this article.

Which permissions are required to recursively delete a directory and its contents?

  • The caller has 'super-user' permissions,

Or

  • The parent directory must have Write + Execute permissions.
  • The directory to be deleted, and every directory within it, requires Read + Write + Execute permissions.

Note

You do not need Write permissions to delete files in directories. Also, the root directory "/" can never be deleted.

Who is the owner of a file or directory?

The creator of a file or directory becomes the owner. In the case of the root directory, this is the identity of the user who created the container.

Which group is set as the owning group of a file or directory at creation?

The owning group is copied from the owning group of the parent directory under which the new file or directory is created.

I am the owning user of a file but I don't have the RWX permissions I need. What do I do?

The owning user can change the permissions of the file to give themselves any RWX permissions they need.

Why do I sometimes see GUIDs in ACLs?

A GUID is shown if the entry represents a user and that user doesn't exist in Azure AD anymore. Usually this happens when the user has left the company or if their account has been deleted in Azure AD. Additionally, service principals and security groups do not have a User Principal Name (UPN) to identify them and so they are represented by their OID attribute (a guid).

How do I set ACLs correctly for a service principal?

When you define ACLs for service principals, it's important to use the Object ID (OID) of the service principal for the app registration that you created. It's important to note that registered apps have a separate service principal in the specific Azure AD tenant. Registered apps have an OID that's visible in the Azure portal, but the service principal has another (different) OID.

To get the OID for the service principal that corresponds to an app registration, you can use the az ad sp show command. Specify the Application ID as the parameter. Here's an example on obtaining the OID for the service principal that corresponds to an app registration with App ID = 18218b12-1895-43e9-ad80-6e8fc1ea88ce. Run the following command in the Azure CLI:

az ad sp show --id 18218b12-1895-43e9-ad80-6e8fc1ea88ce --query objectId

OID will be displayed.

When you have the correct OID for the service principal, go to the Storage Explorer Manage Access page to add the OID and assign appropriate permissions for the OID. Make sure you select Save.

Can I set the ACL of a container?

No. A container does not have an ACL. However, you can set the ACL of the container's root directory. Every container has a root directory, and it shares the same name as the container. For example, if the container is named my-container, then the root directory is named my-container/.

The Azure Storage REST API does contain an operation named Set Container ACL, but that operation cannot be used to set the ACL of a container or the root directory of a container. Instead, that operation is used to indicate whether blobs in a container may be accessed publicly.

Where can I learn more about POSIX access control model?

See also