Access control in Azure Data Lake Storage Gen2

Azure Data Lake Storage Gen2 implements an access control model that supports both Azure Role Based Access Control (RBAC) and POSIX-like access control lists (ACLs). This article summarizes the basics of the access control model for Data Lake Storage Gen2.

Azure Role-based Access Control (RBAC)

Azure Role-based Access Control (RBAC) uses role assignments to effectively apply sets of permissions to users, groups, and service principals for Azure resources. Typically, those Azure resources are constrained to top-level resources (e.g., Azure Storage accounts). In the case of Azure Storage, and consequently Azure Data Lake Storage Gen2, this mechanism has been extended to the file system resource.

While using RBAC role assignments is a powerful mechanism to control user permissions, it is a very coarsely grained mechanism relative to ACLs. The smallest granularity for RBAC is at the file system level and this will be evaluated at a higher priority than ACLs. Therefore, if you assign RBAC permissions on a file system, that user or service principal will have that authorization for ALL directories and files in that file system, regardless of ACL assignments.

Azure Storage provides three built-in RBAC roles for Blob storage:

When a user or service principal is granted RBAC data permissions either through one of these built-in roles, or through a custom role, these permissions are evaluated first upon authorization of a request. If the requested operation is authorized by the caller’s RBAC assignments then authorization is immediately resolved and no additional ACL checks are performed. Alternatively, if the caller does not have an RBAC assignment or the request’s operation does not match the assigned permission, then ACL checks are performed to determine if the caller is authorized to perform the requested operation.

A special note should be made of the Storage Blob Data Owner built-in role. If the caller has this RBAC assignment, then the user is considered a super-user and is granted full access to all mutating operations, including setting the owner of a directory or file as well as ACLs for directories and files for which they are not the owner. Super-user access is the only authorized manner to change the owner of a resource.

Shared Key and Shared Access Signature Authentication

Azure Data Lake Storage Gen2 supports Shared Key and Shared Access Signature methods for authentication. A characteristic of these authentication methods is that no identity is associated with the caller and therefore user permission-based authorization cannot be performed.

In the case of Shared Key, the caller effectively gains ‘super-user’ access, meaning full access to all operations on all resources, including setting owner and changing ACLs.

SAS tokens include allowed permissions as part of the token. The permissions included in the SAS token are effectively applied to all authorization decisions, but no additional ACL checks are performed.

Access control lists on files and directories

There are two kinds of access control lists (ACLs): access ACLs and default ACLs.

  • Access ACLs: Access ACLs control access to an object. Files and directories both have access ACLs.

  • Default ACLs: A template of ACLs associated with a directory that determine the access ACLs for any child items that are created under that directory. Files do not have default ACLs.

Both access ACLs and default ACLs have the same structure.

Note

Changing the default ACL on a parent does not affect the access ACL or default ACL of child items that already exist.

Permissions

The permissions on a file system object are Read, Write, and Execute, and they can be used on files and directories as shown in the following table:

File Directory
Read (R) Can read the contents of a file Requires Read and Execute to list the contents of the directory
Write (W) Can write or append to a file Requires Write and Execute to create child items in a directory
Execute (X) Does not mean anything in the context of Data Lake Storage Gen2 Required to traverse the child items of a directory

Short forms for permissions

RWX is used to indicate Read + Write + Execute. A more condensed numeric form exists in which Read=4, Write=2, and Execute=1, the sum of which represents the permissions. Following are some examples.

Numeric form Short form What it means
7 RWX Read + Write + Execute
5 R-X Read + Execute
4 R-- Read
0 --- No permissions

Permissions inheritance

In the POSIX-style model that's used by Data Lake Storage Gen2, permissions for an item are stored on the item itself. In other words, permissions for an item cannot be inherited from the parent items if the permissions are set after the child item has already been created. Permissions are only inherited if default permissions have been set on the parent items before the child items have been created.

The following table lists some common scenarios to help you understand which permissions are needed to perform certain operations on a Data Lake Storage Gen2 account.

Operation / Oregon/ Portland/ Data.txt
Read Data.txt --X --X --X R--
Append to Data.txt --X --X --X RW-
Delete Data.txt --X --X -WX ---
Create Data.txt --X --X -WX ---
List / R-X --- --- ---
List /Oregon/ --X R-X --- ---
List /Oregon/Portland/ --X --X R-X ---

Note

Write permissions on the file are not required to delete it, so long as the previous two conditions are true.

Users and identities

Every file and directory has distinct permissions for these identities:

  • The owning user
  • The owning group
  • Named users
  • Named groups
  • Named service principals
  • All other users

The identities of users and groups are Azure Active Directory (Azure AD) identities. So unless otherwise noted, a user, in the context of Data Lake Storage Gen2, can refer to an Azure AD user, service principal or security group.

The owning user

The user who created the item is automatically the owning user of the item. An owning user can:

  • Change the permissions of a file that is owned.
  • Change the owning group of a file that is owned, as long as the owning user is also a member of the target group.

Note

The owning user cannot change the owning user of a file or directory. Only super-users can change the owning user of a file or directory.

The owning group

In the POSIX ACLs, every user is associated with a primary group. For example, user "alice" might belong to the "finance" group. Alice might also belong to multiple groups, but one group is always designated as her primary group. In POSIX, when Alice creates a file, the owning group of that file is set to her primary group, which in this case is "finance." The owning group otherwise behaves similarly to assigned permissions for other users/groups.

Assigning the owning group for a new file or directory

  • Case 1: The root directory "/". This directory is created when a Data Lake Storage Gen2 file system is created. In this case, the owning group is set to the user who created the file system if it was done using OAuth. If the filesystem is created using Shared Key, an Account SAS, or a Service SAS, then the owner and owning group are set to $superuser.
  • Case 2 (Every other case): When a new item is created, the owning group is copied from the parent directory.

Changing the owning group

The owning group can be changed by:

  • Any super-users.
  • The owning user, if the owning user is also a member of the target group.

Note

The owning group cannot change the ACLs of a file or directory. While the owning group is set to the user who created the account in the case of the root directory, Case 1 above, a single user account is not valid for providing permissions via the owning group. You can assign this permission to a valid user group if applicable.

Access check algorithm

The following pseudocode represents the access check algorithm for Data Lake Storage Gen2 accounts.

def access_check( user, desired_perms, path ) : 
  # access_check returns true if user has the desired permissions on the path, false otherwise
  # user is the identity that wants to perform an operation on path
  # desired_perms is a simple integer with values from 0 to 7 ( R=4, W=2, X=1). User desires these permissions
  # path is the file or directory
  # Note: the "sticky bit" is not illustrated in this algorithm

# Handle super users.
  if (is_superuser(user)) :
    return True

# Handle the owning user. Note that mask IS NOT used.
entry = get_acl_entry( path, OWNER )
if (user == entry.identity)
    return ( (desired_perms & entry.permissions) == desired_perms )

# Handle the named users. Note that mask IS used.
entries = get_acl_entries( path, NAMED_USER )
for entry in entries:
    if (user == entry.identity ) :
        mask = get_mask( path )
        return ( (desired_perms & entry.permissions & mask) == desired_perms)

# Handle named groups and owning group
member_count = 0
perms = 0
entries = get_acl_entries( path, NAMED_GROUP | OWNING_GROUP )
for entry in entries:
if (user_is_member_of_group(user, entry.identity)) :
    member_count += 1
    perms | =  entry.permissions
if (member_count>0) :
return ((desired_perms & perms & mask ) == desired_perms)

# Handle other
perms = get_perms_for_other(path)
mask = get_mask( path )
return ( (desired_perms & perms & mask ) == desired_perms)

The mask

As illustrated in the Access Check Algorithm, the mask limits access for named users, the owning group, and named groups.

Note

For a new Data Lake Storage Gen2 file system, the mask for the access ACL of the root directory ("/") defaults to 750 for directories and 640 for files. Files do not receive the X bit as it is irrelevant to files in a store-only system.

The mask may be specified on a per-call basis. This allows different consuming systems, such as clusters, to have different effective masks for their file operations. If a mask is specified on a given request, it completely overrides the default mask.

The sticky bit

The sticky bit is a more advanced feature of a POSIX file system. In the context of Data Lake Storage Gen2, it is unlikely that the sticky bit will be needed. In summary, if the sticky bit is enabled on a directory, a child item can only be deleted or renamed by the child item's owning user.

The sticky bit is not shown in the Azure portal.

Default permissions on new files and directories

When a new file or directory is created under an existing directory, the default ACL on the parent directory determines:

  • A child directory’s default ACL and access ACL.
  • A child file's access ACL (files do not have a default ACL).

umask

When creating a file or directory, umask is used to modify how the default ACLs are set on the child item. umask is a 9-bit value on parent directories that contains an RWX value for owning user, owning group, and other.

The umask for Azure Data Lake Storage Gen2 a constant value that is set to 007. This value translates to:

umask component Numeric form Short form Meaning
umask.owning_user 0 --- For owning user, copy the parent's default ACL to the child's access ACL
umask.owning_group 0 --- For owning group, copy the parent's default ACL to the child's access ACL
umask.other 7 RWX For other, remove all permissions on the child's access ACL

The umask value used by Azure Data Lake Storage Gen2 effectively means that the value for other is never transmitted by default on new children, regardless of what the default ACL indicates.

The following pseudocode shows how the umask is applied when creating the ACLs for a child item.

def set_default_acls_for_new_child(parent, child):
    child.acls = []
    for entry in parent.acls :
        new_entry = None
        if (entry.type == OWNING_USER) :
            new_entry = entry.clone(perms = entry.perms & (~umask.owning_user))
        elif (entry.type == OWNING_GROUP) :
            new_entry = entry.clone(perms = entry.perms & (~umask.owning_group))
        elif (entry.type == OTHER) :
            new_entry = entry.clone(perms = entry.perms & (~umask.other))
        else :
            new_entry = entry.clone(perms = entry.perms )
        child_acls.add( new_entry )

Common questions about ACLs in Data Lake Storage Gen2

Do I have to enable support for ACLs?

No. Access control via ACLs is enabled for a Data Lake Storage Gen2 account as long as the Hierarchical Namespace (HNS) feature is turned ON.

If HNS is turned OFF, the Azure RBAC authorization rules still apply.

What is the best way to apply ACLs?

Always use Azure AD security groups as the assigned principal in ACLs. Resist the opportunity to directly assign individual users or service principals. Using this structure will allow you to add and remove users or service principals without the need to reapply ACLs to an entire directory structure. ) Instead, you simply need to add or remove them from the appropriate Azure AD security group. Keep in mind that ACLs are not inherited and so reapplying ACLs requires updating the ACL on every file and subdirectory.

Which permissions are required to recursively delete a directory and its contents?

  • The caller has ‘super-user’ permissions,

Or

  • The parent directory must have Write + Execute permissions.
  • The directory to be deleted, and every directory within it, requires Read + Write + Execute permissions.

Note

You do not need Write permissions to delete files in directories. Also, the root directory "/" can never be deleted.

Who is the owner of a file or directory?

The creator of a file or directory becomes the owner. In the case of the root directory, this is the identity of the user who created the file system.

Which group is set as the owning group of a file or directory at creation?

The owning group is copied from the owning group of the parent directory under which the new file or directory is created.

I am the owning user of a file but I don’t have the RWX permissions I need. What do I do?

The owning user can change the permissions of the file to give themselves any RWX permissions they need.

Why do I sometimes see GUIDs in ACLs?

A GUID is shown if the entry represents a user and that user doesn't exist in Azure AD anymore. Usually this happens when the user has left the company or if their account has been deleted in Azure AD. Additionally, service principals and security groups do not have a User Principal Name (UPN) to identify them and so they are represented by their OID attribute (a guid).

Does Data Lake Storage Gen2 support inheritance of ACLs?

Azure RBAC assignments do inherit. Assignments flow from subscription, resource group, and storage account resources down to the file system resource.

ACLs do not inherit. However, default ACLs can be used to set ACLs for child subdirectories and files created under the parent directory.

Where can I learn more about POSIX access control model?

See also