Use the Azure Data Lake Storage Gen2 URI

The Hadoop Filesystem driver that is compatible with Azure Data Lake Storage Gen2 is known by its scheme identifier abfs (Azure Blob File System). Consistent with other Hadoop Filesystem drivers, the ABFS driver employs a URI format to address files and directories within a Data Lake Storage Gen2 enabled account.

URI syntax

The Azure Blob File System driver can be used with the Data Lake Storage endpoint of an account even if that account does not have a hierarchical namespace enabled. If the storage account does not have a hierarchical namespace, then the shorthand URI syntax is:

abfs[s]1://<file_system>2@<account_name>3.dfs.core.windows.net/<path>4/<file_name>5
  1. Scheme identifier: The abfs protocol is used as the scheme identifier. If you add an s at the end (abfss) then the ABFS Hadoop client driver will always use Transport Layer Security (TLS) irrespective of the authentication method chosen. If you choose OAuth as your authentication, then the client driver will always use TLS even if you specify abfs instead of abfss because OAuth solely relies on the TLS layer. Finally, if you choose to use the older method of storage account key, then the client driver interprets abfs to mean that you don't want to use TLS.

  2. File system: The parent location that holds the files and folders. This is the same as containers in the Azure Storage Blob service.

  3. Account name: The name given to your storage account during creation.

  4. Paths: A forward slash delimited (/) representation of the directory structure.

  5. File name: The name of the individual file. This parameter is optional if you're addressing a directory.

However, if the account you want to address does have a hierarchical namespace, then the shorthand URI syntax is:

/<path>1/<file_name>2
  1. Path: A forward slash delimited (/) representation of the directory structure.

  2. File Name: The name of the individual file.

Next steps