FileStore

FileStore is a special folder within Databricks File System where you can save files and have them accessible to your web browser. You can use FileStore to:

  • Save files, such as images and libraries, that are accessible within HTML and JavaScript when you call displayHTML.
  • Save output files that you want to download to your local desktop.

When you use certain features, Azure Databricks puts files in the following folders under FileStore:

  • /FileStore/jars - contains libraries that you upload. If you delete files in this folder, libraries that reference these files in your Workspace may no longer work.
  • /FileStore/tables - contains the files that you import using the UI. If you delete files in this folder, tables that you created from these files may no longer be accessible.
  • /FileStore/plots - contains images created in notebooks when you call display() on a Python or R plot object, such as a ggplot or matplotlib plot. If you delete files in this folder, you may have to regenerate those plots in the notebooks that reference them. See Matplotlib and ggplot in Python Notebooks for more information.
  • /FileStore/import-stage - contains temporary files created when you import notebooks or Databricks archives files. These temporary files disappear after the notebook import completes.

Save a file to FileStore

To save a file to FileStore, put it in the /FileStore directory in DBFS:

dbutils.fs.put("/FileStore/my-stuff/my-file.txt", "Contents of my file")

In the following, replace <databricks-instance> with the <region>.azuredatabricks.net domain name of your Azure Databricks deployment.

Files stored in /FileStore are accessible in your web browser at https://<databricks-instance>/files/<path-to-file>?o=######. For example, the file you stored in /FileStore/my-stuff/my-file.txt is accessible at https://<databricks-instance>/files/my-stuff/my-file.txt?o=###### where the number after o= is the same as in your URL.

Embed static images in notebooks

You can use the files/ location to embed static images into your notebooks:

displayHTML("<img src ='files/image.jpg/'>")

or Markdown image import syntax:

%md
![my_test_image](files/image.jpg)

You can upload static images using the DBFS Databricks REST API and the popular requests Python HTTP library. In the following example:

  • Replace <databricks-instance> with the |Workspace URL| domain name of your Azure Databricks deployment.
  • Replace <token> with the value of your personal access token.
  • Replace <image-dir> with the location in FileStore where you want to upload the image files.
import requests
import json
import os
from base64 import b64encode, b64decode,standard_b64encode

TOKEN = b'<token>'
headers = {"Authorization": b"Basic " + standard_b64encode(b"token:" + TOKEN)}
url = "https://<databricks-instance>/api/2.0"
dbfs_dir = "dbfs:/FileStore/<image-dir>/"

def perform_query(path, headers, data={}):
  session = requests.Session()
  resp = session.request('POST', url + path, data=json.dumps(data), verify=True, headers=headers)
  return resp.json()

def mkdirs(path, headers):
  _data = {}
  _data['path'] = path
  return perform_query('/dbfs/mkdirs', headers=headers, data=_data)

def create(path, overwrite, headers):
  _data = {}
  _data['path'] = path
  _data['overwrite'] = overwrite
  return perform_query('/dbfs/create', headers=headers, data=_data)

def add_block(handle, data, headers):
  _data = {}
  _data['handle'] = handle
  _data['data'] = data
  return perform_query('/dbfs/add-block', headers=headers, data=_data)

def close(handle, headers):
  _data = {}
  _data['handle'] = handle
  return perform_query('/dbfs/close', headers=headers, data=_data)

def put_file(src_path, dbfs_path, overwrite, headers):
  handle = create(dbfs_path, overwrite, headers=headers)['handle']
  print("Putting file: " + dbfs_path)
  with open(src_path, 'rb') as local_file:
    while True:
      contents = local_file.read(2**20)
      if len(contents) == 0:
        break
      add_block(handle, b64encode(contents).decode(), headers=headers)
    close(handle, headers=headers)

mkdirs(path=dbfs_dir, headers=headers)
files = [f for f in os.listdir('.') if os.path.isfile(f)]
for f in files:
  if ".png" in f:
    target_path = dbfs_dir + f
    resp = put_file(src_path=f, dbfs_path=target_path, overwrite=True, headers=headers)
    if resp == None:
      print("Success")
    else:
      print(resp)

Scale static images

To scale the size of an image that you have saved to DBFS, copy the image to /FileStore and then resize using image parameters in displayHTML:

dbutils.fs.cp('dbfs:/user/experimental/MyImage-1.png','dbfs:/FileStore/images/')
displayHTML('''<img src="files/images/MyImage-1.png" style="width:600px;height:600px;">''')

Use a Javascript library

This notebook shows how to use FileStore to contain a JavaScript library.

FileStore demo notebook

Get notebook