DBFS API

The DBFS API is a Databricks API that makes it simple to interact with various data sources without having to include your credentials every time you read a file. See Databricks File System (DBFS) for more information. For an easy to use command line client of the DBFS API, see Databricks CLI.

Note

To ensure high quality of service under heavy load, Azure Databricks is now enforcing API rate limits for DBFS API calls. Limits are set per workspace to ensure fair usage and high availability. Automatic retries are available using Databricks CLI version 0.12.0 and above. We advise all customers to switch to the latest Databricks CLI version.

Important

To access Databricks REST APIs, you must authenticate.

Limitations

Using the DBFS API with firewall enabled storage containers is not supported. Databricks recommends you use Databricks Connect or az storage.

Add block

Endpoint HTTP Method
2.0/dbfs/add-block POST

Append a block of data to the stream specified by the input handle. If the handle does not exist, this call will throw an exception with RESOURCE_DOES_NOT_EXIST. If the block of data exceeds 1 MB, this call will throw an exception with MAX_BLOCK_SIZE_EXCEEDED. A typical workflow for file upload would be:

  1. Call create and get a handle.
  2. Make one or more add-block calls with the handle you have.
  3. Call close with the handle you have.

Example

curl --netrc -X POST \
https://adb-1234567890123456.7.azuredatabricks.net/api/2.0/dbfs/add-block \
--data '{ "data": "SGVsbG8sIFdvcmxkIQ==", "handle": 1234567890123456 }'
{}

Request structure

Field Name Type Description
handle INT64 The handle on an open stream. This field is required.
data BYTES The base64-encoded data to append to the stream. This has a limit of 1 MB. This field is required.

Close

Endpoint HTTP Method
2.0/dbfs/close POST

Close the stream specified by the input handle. If the handle does not exist, this call throws an exception with RESOURCE_DOES_NOT_EXIST. A typical workflow for file upload would be:

  1. Call create and get a handle.
  2. Make one or more add-block calls with the handle you have.
  3. Call close with the handle you have.

Example

curl --netrc -X POST \
https://adb-1234567890123456.7.azuredatabricks.net/api/2.0/dbfs/close \
--data '{ "handle": 1234567890123456 }'

If the call succeeds, no output displays.

Request structure

Field Name Type Description
handle INT64 The handle on an open stream. This field is required.

Create

Endpoint HTTP Method
2.0/dbfs/create POST

Open a stream to write to a file and returns a handle to this stream. There is a 10 minute idle timeout on this handle. If a file or directory already exists on the given path and overwrite is set to false, this call throws an exception with RESOURCE_ALREADY_EXISTS. A typical workflow for file upload would be:

  1. Call create and get a handle.
  2. Make one or more add-block calls with the handle you have.
  3. Call close with the handle you have.

Example

curl --netrc -X POST \
https://adb-1234567890123456.7.azuredatabricks.net/api/2.0/dbfs/create \
--data '{ "path": "/tmp/HelloWorld.txt", "overwrite": true }'
{ "handle": 1234567890123456 }

Request structure

Field Name Type Description
path STRING The path of the new file. The path should be the absolute DBFS path (for example
/mnt/my-file.txt). This field is required.
overwrite BOOL The flag that specifies whether to overwrite existing file or files.

Response structure

Field Name Type Description
handle INT64 Handle which should subsequently be passed into the add-block and close calls when writing to a file through a stream.

Delete

Endpoint HTTP Method
2.0/dbfs/delete POST

Delete the file or directory (optionally recursively delete all files in the directory). This call throws an exception with IO_ERROR if the path is a non-empty directory and recursive is set to false or on other similar errors.

When you delete a large number of files, the delete operation is done in increments. The call returns a response after approximately 45 seconds with an error message (503 Service Unavailable) asking you to re-invoke the delete operation until the directory structure is fully deleted. For example:

{
  "error_code": "PARTIAL_DELETE",
  "message": "The requested operation has deleted 324 files. There are more files remaining. You must make another request to delete more."
}

For operations that delete more than 10K files, we discourage using the DBFS REST API, but advise you to perform such operations in the context of a cluster, using the File system utility (dbutils.fs). dbutils.fs covers the functional scope of the DBFS REST API, but from notebooks. Running such operations using notebooks provides better control and manageability, such as selective deletes, and the possibility to automate periodic delete jobs.

Example

curl --netrc -X POST \
https://adb-1234567890123456.7.azuredatabricks.net/api/2.0/dbfs/delete \
--data '{ "path": "/tmp/HelloWorld.txt" }'
{}

Request structure

Field Name Type Description
path STRING The path of the file or directory to delete. The path should be the absolute DBFS path (e.g. /mnt/foo/). This field is required.
recursive BOOL Whether or not to recursively delete the directory’s contents. Deleting empty directories can be done without providing the recursive flag.

Get status

Endpoint HTTP Method
2.0/dbfs/get-status GET

Get the file information of a file or directory. If the file or directory does not exist, this call throws an exception with RESOURCE_DOES_NOT_EXIST.

Example

curl --netrc -X GET \
https://adb-1234567890123456.7.azuredatabricks.net/api/2.0/dbfs/get-status \
--data '{ "path": "/tmp/HelloWorld.txt" }' \
| jq .
{
  "path": "/tmp/HelloWorld.txt",
  "is_dir": false,
  "file_size": 13,
  "modification_time": 1622054945000
}

Request structure

Field Name Type Description
path STRING The path of the file or directory. The path should be the absolute DBFS path (for example, /mnt/my-folder/). This field is required.

Response structure

Field Name Type Description
path STRING The path of the file or directory.
is_dir BOOL Whether the path is a directory.
file_size INT64 The length of the file in bytes or zero if the path is a directory.
modification_time INT64 The last time, in epoch milliseconds, the file or directory was modified.

List

Endpoint HTTP Method
2.0/dbfs/list GET

List the contents of a directory, or details of the file. If the file or directory does not exist, this call throws an exception with RESOURCE_DOES_NOT_EXIST.

When calling list on a large directory, the list operation will time out after approximately 60 seconds. We strongly recommend using list only on directories containing less than 10K files and discourage using the DBFS REST API for operations that list more than 10K files. Instead, we recommend that you perform such operations in the context of a cluster, using the File system utility (dbutils.fs), which provides the same functionality without timing out.

Example

curl --netrc -X GET \
https://adb-1234567890123456.7.azuredatabricks.net/api/2.0/dbfs/list \
--data '{ "path": "/tmp" }' \
| jq .
{
  "files": [
    {
      "path": "/tmp/HelloWorld.txt",
      "is_dir": false,
      "file_size": 13,
      "modification_time": 1622054945000
    },
    ...
  ]
}

Request structure

Field Name Type Description
path STRING The path of the file or directory. The path should be the absolute DBFS path (e.g. /mnt/foo/). This field is required.

Response structure

Field Name Type Description
files An array of FileInfo A list of FileInfo that describe contents of directory or file.

Mkdirs

Endpoint HTTP Method
2.0/dbfs/mkdirs POST

Create the given directory and necessary parent directories if they do not exist. If there exists a file (not a directory) at any prefix of the input path, this call throws an exception with RESOURCE_ALREADY_EXISTS. If this operation fails it may have succeeded in creating some of the necessary parent directories.

Example

curl --netrc -X POST \
https://adb-1234567890123456.7.azuredatabricks.net/api/2.0/dbfs/mkdirs \
--data '{ "path": "/tmp/my-new-dir" }'
{}

Request structure

Field Name Type Description
path STRING The path of the new directory. The path should be the absolute DBFS path (for example,
/mnt/my-folder/). This field is required.

Move

Endpoint HTTP Method
2.0/dbfs/move POST

Move a file from one location to another location within DBFS. If the source file does not exist, this call throws an exception with RESOURCE_DOES_NOT_EXIST. If there already exists a file in the destination path, this call throws an exception with RESOURCE_ALREADY_EXISTS. If the given source path is a directory, this call always recursively moves all files.

When moving a large number of files, the API call will time out after approximately 60 seconds, potentially resulting in partially moved data. Therefore, for operations that move more than 10K files, we strongly discourage using the DBFS REST API. Instead, we recommend that you perform such operations in the context of a cluster, using the File system utility (dbutils.fs) from a notebook, which provides the same functionality without timing out.

Example

curl --netrc -X POST \
https://adb-1234567890123456.7.azuredatabricks.net/api/2.0/dbfs/move \
--data '{ "source_path": "/tmp/HelloWorld.txt", "destination_path": "/tmp/my-new-dir/HelloWorld.txt" }'
{}

Request structure

Field Name Type Description
source_path STRING The source path of the file or directory. The path should be the absolute DBFS path (for example, /mnt/my-source-folder/). This field is required.
destination_path STRING The destination path of the file or directory. The path should be the absolute DBFS path (for example, /mnt/my-destination-folder/). This field is required.

Put

Endpoint HTTP Method
2.0/dbfs/put POST

Upload a file through the use of multipart form post. It is mainly used for streaming uploads, but can also be used as a convenient single call for data upload.

The amount of data that can be passed using the contents parameter is limited to 1 MB if specified as a string (MAX_BLOCK_SIZE_EXCEEDED is thrown if exceeded) and 2 GB as a file.

Example

To upload a local file named HelloWorld.txt in the current directory:

curl --netrc -X POST \
https://adb-1234567890123456.7.azuredatabricks.net/api/2.0/dbfs/put \
--form contents=@HelloWorld.txt \
--form path="/tmp/HelloWorld.txt" \
--form overwrite=true

To upload content Hello, World! as a base64 encoded string:

curl --netrc -X POST \
https://adb-1234567890123456.7.azuredatabricks.net/api/2.0/dbfs/put \
--data '{ "path": "/tmp/HelloWorld.txt", "contents": "SGVsbG8sIFdvcmxkIQ==", "overwrite": true }'
{}

Request structure

Field Name Type Description
path STRING The path of the new file. The path should be the absolute DBFS path (e.g. /mnt/foo/). This field is required.
contents BYTES This parameter might be absent, and instead a posted file will be used.
overwrite BOOL The flag that specifies whether to overwrite existing files.

Read

Endpoint HTTP Method
2.0/dbfs/read GET

Return the contents of a file. If the file does not exist, this call throws an exception with RESOURCE_DOES_NOT_EXIST. If the path is a directory, the read length is negative, or if the offset is negative, this call throws an exception with INVALID_PARAMETER_VALUE. If the read length exceeds 1 MB, this call throws an exception with MAX_READ_SIZE_EXCEEDED. If offset + length exceeds the number of bytes in a file, reads contents until the end of file.

Example

curl --netrc -X GET \
https://adb-1234567890123456.7.azuredatabricks.net/api/2.0/dbfs/read \
--data '{ "path": "/tmp/HelloWorld.txt", "offset": 1, "length": 8 }' \
| jq .
{
  "bytes_read": 8,
  "data": "ZWxsbywgV28="
}

Request structure

Field Name Type Description
path STRING The path of the file to read. The path should be the absolute DBFS path (e.g. /mnt/foo/). This field is required.
offset INT64 The offset to read from in bytes.
length INT64 The number of bytes to read starting from the offset. This has a limit of 1 MB, and a default value of 0.5 MB.

Response structure

Field Name Type Description
bytes_read INT64 The number of bytes read (could be less than length if we hit end of file). This refers to number of bytes read in unencoded version (response data is base64-encoded).
data BYTES The base64-encoded contents of the file read.

Data structures

In this section:

FileInfo

The attributes of a file or directory.

Field Name Type Description
path STRING The path of the file or directory.
is_dir BOOL Whether the path is a directory.
file_size INT64 The length of the file in bytes or zero if the path is a directory.
modification_time INT64 The last time, in epoch milliseconds, the file or directory was modified.