WebHDFS FileSystem APIs

Azure Data Lake Store is a cloud-scale file system that is compatible with Hadoop Distributed File System (HDFS) and works with the Hadoop ecosystem. Your existing applications or services that use the WebHDFS API can easily integrate with ADLS.

URL for REST calls

A typical WebHDFS REST URL looks like the following:

http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=<OP>...

To map this URL for a REST call to Data Lake Store, make the following changes:

  • Use https instead of http

  • For <HOST>, use the fully-qualified account name, like <data_lake_store_name>.azuredatalakestore.net

  • The :<PORT> is optional

So, a REST endpoint URL for Data Lake Store using the WebHDFS APIs should look like the following:

https://<data_lake_store_name>.azuredatalakestore.net/webhdfs/v1/<PATH>?op=<OP>... 

Passing authorization token in the message header

Data Lake Store uses Azure Active Directory to authorize REST calls. All REST calls to Data Lake Store must include an authorization token as part of the message header. For a detailed discussion on how Azure Active Directory uses OAuth, see OAuth2.0 in Azure Active Directory. For instructions on how to request an authorization token, see How do I authenticate using Azure Active Directory.

Note

For a list of common headers and parameters that are required for calls to Data Lake Store, see Common parameters and headers.

WebHDFS compliant APIs for Data Lake Store

The table below lists the WebHDFS APIs that can be used with Data Lake Store. Wherever applicable, the table also lists deviation from the standard WebHDFS APIs, such as if some parameters are not supported, or when some parameters are supported differently.

Note

Data Lake Store currently supports WebHDFS version 2.7.2.

WebHDFS API with Data Lake Store Request/Response Important considerations
CREATE See here The following request parameters are not supported.

- blocksize - This is fixed at 256MB and cannot be changed.
- replication - This is handled internally by Data Lake Store.Even if you provide this parameter, it will be ignored and no error will be returned.
- buffersize - This is fixed at 4MB and cannot be changed.
APPEND See here The following request parameters are not supported:

- buffersize - This is fixed at 4MB and cannot be changed
CONCAT See here -
OPEN See here The following request parameters are not supported:

- buffersize - This is fixed at 4MB and cannot be changed
MKDIRS See here -
RENAME See here -
DELETE See here -
GETFILESTATUS See here The following response parameters are supported differently:

- type - SYMLINK is not supported so it will not be returned; FILE and DIRECTORY will be.
LISTSTATUS See here -
GETCONTENTSUMMARY See here The following response parameters are not supported:

- quota - Data Lake Store does not return quota.
- spaceQuota - Data Lake Store does not return spaceQuota.
SETPERMISSION See here -
SETOWNER See here -
MODIFYACLENTRIES See here -
REMOVEACLENTRIES See here -
SETACL See here -
GETACLSTATUS See here -
CHECKACCESS See here -