Manage files in volumes

This article provides examples for managing files in Unity Catalog volumes for various user interfaces, tools, libraries, and languages.

Databricks recommends using volumes for managing all access to non-tabular data in cloud object storage. Examples of non-tabular data include the following:

  • Data files for ingestion such as CSV, JSON, and Parquet.
  • Text, image, and audio files for data science, ML, and AI workloads.
  • CSV or JSON artifacts written by Azure Databricks for integration with external systems.

You can use volumes for storing files such as libraries, init scripts, and build artifacts. See Recommendations for files in volumes and workspace files.

Work with files in volumes using the Catalog Explorer UI

Catalog Explorer provides options for common file management tasks for files stored with Unity Catalog volumes. See Create and work with volumes.

Rename or delete volume

Click the kebab menu Kebab menu next to Upload to this volume to see options to Rename or Delete the volume.

Set permissions on a volume

You can use Catalog Explorer to manage permissions on a volume or assign a new principal as the owner of a volume. See Manage privileges in Unity Catalog and Manage Unity Catalog object ownership.

Upload files to a volume

The Upload to this volume button opens a dialog to upload files. See Upload files to a Unity Catalog volume.

Uploaded files cannot exceed 5 GB.

UI file management tasks for volumes

Click the kebab menu Kebab menu next to a file name to perform the following actions:

  • Copy path
  • Download file
  • Delete file
  • Create table

Create a table from data in a volume

Azure Databricks provides a UI to create a Unity Catalog managed table from a file or directory of files stored in a Unity Catalog volume.

You must have CREATE TABLE permissions in the target schema and have access to a running SQL warehouse.

You can use the provided UI to make the following selections:

  • Choose to Create new table or Overwrite existing table
  • Select the target Catalog and Schema.
  • Specify the Table name.
  • Override default column names and types, or choose to exclude columns.

Note

Click Advanced attributes to view additional options.

Click Create table to create the table in the specified location. Upon completion, Catalog Explorer displays the table details.

Programmatically work with files in volumes on Azure Databricks

You can read and write files in volumes from all supported languages and workspace editors using the following format:

/Volumes/catalog_name/schema_name/volume_name/path/to/files

You interact with files in volumes in the same way that you interact with files in any cloud object storage location. That means that if you currently manage code that uses cloud URIs, DBFS mount paths, or DBFS root paths to interact with data or files, you can update your code to use volumes instead.

Note

Volumes are only used for non-tabular data. Databricks recommends registering tabular data using Unity Catalog tables and then reading and writing data using table names.

Read and write data in volumes

You can use Apache Spark, pandas, Spark SQL, and other OSS libraries to read and write data files in volumes.

The following examples demonstrate reading a CSV file stored in a volume:

Python

df = spark.read.format("csv").load("/Volumes/catalog_name/schema_name/volume_name/data.csv")

display(df)

Pandas

import pandas as pd

df = pd.read_csv('/Volumes/catalog_name/schema_name/volume_name/data.csv')

display(df)

Sql

SELECT * FROM csv.`/Volumes/catalog_name/schema_name/volume_name/data.csv`

Utility commands for files in volumes

Databricks provides the following tools for managing files in volumes:

  • The dbutils.fs submodule in Databricks Utilities. See File system utility (dbutils.fs).
  • The %fs magic, which is an alias for dbutils.fs.
  • The %sh magic, which allows bash command against volumes.

For an example of using these tools to download files from the internet, unzip files, and move files from ephemeral block storage to volumes, see Download data from the internet.

You can also use OSS packages for file utility commands, such as the Python os module, as shown in the following example:

import os

os.mkdir('/Volumes/catalog_name/schema_name/volume_name/directory_name')

Manage files in volumes from external tools

Databricks provides a suite of tools for programmatically managing files in volumes from your local environment or integrated systems.

SQL commands for files in volumes

Azure Databricks supports the following SQL keywords for interacting with files in volumes:

  • PUT
  • GET
  • LIST
  • DELETE

Note

Databricks notebooks or query editor only support the LIST command.

The following Databricks SQL connectors and drivers support managing files in volumes:

Manage files in volumes with the Databricks CLI

Use the submcommands in databricks fs. See fs command group.

Note

The Databricks CLI requires the scheme dbfs:/ to precede all volumes paths. For example, dbfs:/Volumes/catalog_name/schema_name/volume_name/path/to/data.

Manage files in volumes with SDKs

The following SDKs support managing files in volumes:

Manage files in volumes with the REST API

Use the Files API to manage files in volumes.

REST API examples for files in volumes

The following examples use curl and the Databricks REST API to perform file management tasks in volumes.

The following example creates an empty folder named my-folder in the specified volume.

curl --request PUT "https://${DATABRICKS_HOST}/api/2.0/fs/directories/Volumes/main/default/my-volume/my-folder/" \
--header "Authorization: Bearer ${DATABRICKS_TOKEN}"

The following example creates a file named data.csv with the specified data in the specified path in the volume.

curl --request PUT "https://${DATABRICKS_HOST}/api/2.0/fs/files/Volumes/main/default/my-volume/my-folder/data.csv?overwrite=true" \
--header "Authorization: Bearer ${DATABRICKS_TOKEN}" \
--header "Content-Type: application/octet-stream" \
--data-binary $'id,Text\n1,Hello World!'

The following example lists the contents of a volume in the specified path. This example uses jq to format the response body’s JSON for easier reading.

curl --request GET "https://${DATABRICKS_HOST}/api/2.0/fs/directories/Volumes/main/default/my-volume/" \
--header "Authorization: Bearer ${DATABRICKS_TOKEN}" | jq .

The following example lists the contents of a folder in a volume in the specified path. This example uses jq to format the response body’s JSON for easier reading.

curl --request GET "https://${DATABRICKS_HOST}/api/2.0/fs/directories/Volumes/main/default/my-volume/my-folder" \
--header "Authorization: Bearer ${DATABRICKS_TOKEN}" | jq .

The following example prints the contents of a file in the specified path in a volume.

curl --request GET "https://${DATABRICKS_HOST}/api/2.0/fs/files/Volumes/main/default/my-volume/my-folder/data.csv" \
--header "Authorization: Bearer ${DATABRICKS_TOKEN}"

The following example deletes a file in the specified path from a volume.

curl --request DELETE "https://${DATABRICKS_HOST}/api/2.0/fs/files/Volumes/main/default/my-volume/my-folder/data.csv" \
--header "Authorization: Bearer ${DATABRICKS_TOKEN}"

The following example deletes a folder from the specified volume.

curl --request DELETE "https://${DATABRICKS_HOST}/api/2.0/fs/directories/Volumes/main/default/my-volume/my-folder/" \
--header "Authorization: Bearer ${DATABRICKS_TOKEN}"