Libraries

To make third-party or locally-built code available to notebooks and jobs running on your clusters, you can install a library. Libraries can be written in Python, Java, Scala, and R. You can upload Java, Scala, and Python libraries and point to external packages in PyPI, Maven, and CRAN repositories.

You can manage libraries using the UI, the CLI, and by invoking the Libraries API. This topic focuses on performing library tasks using the UI. For the other methods, see Libraries CLI and Libraries API.

In Databricks Runtime 5.1 and above, you can also install Python libraries directly into a notebook session using Library utilities. Because libraries installed into a notebook are guaranteed not to interfere with libraries installed into any other notebooks even if all the notebooks are running on the same cluster, Azure Databricks recommends that you use this method when possible.

Some libraries require custom configuration and cannot be created using the methods described in this topic. To install these libraries, you can configure a cluster with a UNIX script that runs at cluster creation time.

Library modes

Azure Databricks supports three library modes: Workspace, cluster-installed, and notebook-scoped.

A Workspace library exists in the Workspace. A Workspace library has the same attributes as a cluster-installed library, plus its path in the Workspace. A Workspace library is effectively a template from which you create a cluster-installed library. To allow a library to be shared by all users in a Workspace, create the library in the Shared folder. To make it available only to a single user, create the library in the user folder.

A cluster-installed library exists only in the context of the cluster it will be installed on. It has all of the attributes the cluster needs to install the library: DBFS path to the JAR, Maven coordinate, PyPI package, and so on.

A notebooks-scoped library exists only in the context of the notebook in which it is installed. See Library utilities.

Library lifecycles

Workspace libraries can be created and deleted. All libraries can be installed on a cluster and uninstalled from a cluster.

When you create a Workspace library or install a new library on a cluster, you can upload a new library, reference an uploaded library, or specify a library package. Libraries that you upload are stored in the FileStore. Python packages are installed in the Spark container using pip install.

To use a library, you must install it on a cluster. To use a newly installed library in a notebook that was attached to a cluster before the library was installed, you must detach and reattach the cluster to the notebook.

When you uninstall a library from a cluster, the status of an installed library changes to Uninstall pending restart. The library is uninstalled when you restart the cluster.

Workspace libraries

Create a Workspace library

  1. Right-click the Workspace folder where you want to store the library.

  2. Select Create > Library.

    no-alternative-text

    The Create Library dialog displays.

    no-alternative-text

  3. Select the Library Source and follow the appropriate procedure:

Upload a Jar, Python Egg, or Python Wheel

  1. In the Library Source button list, select Upload.
  2. Select Jar, Python Egg, or Python Whl.
  3. Optionally enter a library name.
  4. Drag your Jar, Egg, or Whl to the drop box or click the drop box and navigate to a file. The file is uploaded to dbfs:/FileStore/jars.
  5. Click Create. The library status screen displays.
  6. Optionally install the library on a cluster.

Reference an uploaded Jar, Python Egg, or Python Wheel

If you’ve already uploaded a Jar, Egg, or Wheel to object storage you can reference it in a Workspace library.

You can choose a library in DBFS.

  1. Select DBFS in the Library Source button list.
  2. Select Jar, Python Egg, or Python Whl.
  3. Optionally enter a library name.
  4. Specify the DBFS path to the library.
  5. Click Create. The library status screen displays.
  6. Optionally install the library on a cluster.

PyPI package

  1. In the Library Source button list, select PyPI.
  2. In the Repository field, optionally enter a PyPI repository URL.
  3. Enter a PyPI package name. To install a specific version of a library use this format for the library: <library>==<version>. For example, scikit-learn==0.19.1.
  4. Click Create. The library status screen displays.
  5. Optionally install the library on a cluster.

Maven or Spark package

  1. In the Library Source button list, select Maven.

  2. In the Repository field, optionally enter a Maven repository URL.

    Note

    Internal Maven repositories are not supported.

  3. Specify a Maven coordinate. Do one of the following:

    • In the Coordinate field, enter the Maven coordinate of the library to install. Maven coordinates are in the form groupId:artifactId:version; for example, com.databricks:spark-avro_2.10:1.0.0.
    • If you don’t know the exact coordinate, enter the library name and click Search Packages. A list of matching packages displays. To display details about a package, click its name. You can sort packages by name, organization, and rating. You can also filter the results by writing a query in the search bar. The results refresh automatically. a. Select Maven Central or Spark Packages in the drop-down list at the top left. b. Optionally select the package version in the Releases column. c. Click + Select next to a package. The Coordinate field is filled in with the selected package and version.
  4. In the Exclusions field, optionally provide the groupId and the artifactId of the dependencies that you want to exclude; for example, log4j:log4j.

  5. Click Create. The library status screen displays.

  6. Optionally install the library on a cluster.

CRAN package

  1. In the Library Source button list, select CRAN.
  2. In the Repository field, optionally enter the CRAN repository URL.
  3. In the Package field, enter the name of the package.
  4. Click Create. The library detail screen displays.
  5. Optionally install the library on a cluster.

Note

CRAN mirrors serve the latest version of a library. As a result, you may end up with different versions of an R package if you attach the library to different clusters at different times. To learn how to manage and fix R package versions on Databricks, see the Knowledge Base.

View Workspace library details

  1. Go to the Workspace folder containing the library.
  2. Click the library name.

The library details page shows the running clusters and the install status of the library. If the library is installed, the page contains a link to the package host. If the library was uploaded, the page displays a link to the uploaded package file.

Move a Workspace library

  1. Go to the Workspace folder containing the library.
  2. Click the drop-down arrow Menu Dropdown to the right of the library name and select Move. A folder browser displays.
  3. Click the destination folder.
  4. Click Select.
  5. Click Confirm and Move.

Delete a Workspace library

Important

Before deleting a Workspace library, you should uninstall it from all clusters.

To delete a Workspace library:

  1. Move the library to the Trash folder.
  2. Either permanently delete the library in the Trash folder or empty the Trash folder.

Install a library on a cluster

There are two main ways to install a library on a cluster. You can install a Workspace library or install a library only on a specific cluster.

Workspace library

To install a library that already exists in the Workspace, you can start from a cluster or a library:

Cluster

  1. Click the clusters icon Clusters Icon in the sidebar.
  2. Click a cluster name.
  3. Click the Libraries tab.
  4. Click Install New.
  5. In the Library Source button list, select Workspace.
  6. Select a Workspace library.
  7. Click Install.
  8. To configure the library to be installed on all clusters: a. Click the library. b. Select the Install automatically on all clusters checkbox. c. Click Confirm.

Library

  1. Go to the folder containing the library.
  2. Click the library name.
  3. Do one of the following:
    • To configure the library to be installed on all clusters, select the Install automatically on all clusters checkbox and click Confirm.
    • Select the checkbox next to the cluster that you want to install the library on and click Install.

The library is installed on the cluster.

Cluster-installed library

You can install a library on a specific cluster without making it available as a Workspace library.

To install a library on a cluster:

  1. Click the clusters icon Clusters Icon in the sidebar.
  2. Click a cluster name.
  3. Click the Libraries tab.
  4. Click Install New.
  5. Follow one of the methods for creating a Workspace library. After you click Create, the library is installed on the cluster.

Uninstall a library from a cluster

To uninstall a library you can start from a cluster or a library:

Cluster

  1. Click the clusters icon Clusters Icon in the sidebar.
  2. Click a cluster name.
  3. Click the Libraries tab.
  4. Select the checkbox next to the cluster you want to uninstall the library from, click Uninstall, then Confirm. The Status changes to Uninstall pending restart.

Library

  1. Go to the folder containing the library.
  2. Click the library name.
  3. Select the checkbox next to the cluster you want to uninstall the library from, click Uninstall, then Confirm. The Status changes to Uninstall pending restart.
  4. Click the cluster name to go to the cluster detail page.

Click Restart and Confirm to uninstall the library. The library is removed from the cluster’s Libraries tab.

View the libraries installed on a cluster

  1. Click the clusters icon Clusters Icon in the sidebar.
  2. Click the cluster name.
  3. Click the Libraries tab. For each library, the tab displays the name and version, type, install status, and, if uploaded, the source file.

Update a cluster-installed library

To update a cluster-installed library, uninstall the old version of the library and install a new version.