The cluster returns
Cancelled in a Python notebook. Notebooks in all other languages execute successfully on the same cluster.
When you install a conflicting version of a library, such as
pandas to the
PYTHONPATH, then the Python REPL can break, causing all commands to return
Cancelled after 30 seconds. This also breaks %sh, the notebook macro that lets you enter shell scripts in Python notebook cells.
- If you’ve recently installed a
bokehlibrary on the cluster, the installation may have included an incompatible
tornadolibrary. See Problem: Cluster Cancels Python Command Execution after Installing Bokeh.
- If you’ve installed a
numpylibrary, it may be incompatible. See Problem: Python Command Execution Fails with AttributeError.
To solve this problem, do the following:
- Identify the conflicting library and uninstall it.
- Install the correct version of the library in a notebook or with a cluster-scoped init script.
Identify the conflicting library
- Uninstall each library one at a time, and check if the Python REPL still breaks.
- If the REPL still breaks, reinstall the library you removed and remove the next one.
- When you find the library that causes the REPL to break, install the correct version of that library using one of the two methods below.
You can also inspect the driver log (
std.err) for the cluster (on the Cluster Configuration page) for a stack trace and error message that can help identify the library conflict.
Install the correct library
Do one of the following.
Option 1: Install in a notebook using pip3
%sh sudo apt-get -y install python3-pip pip3 install <library-name>
Option 2: Install using a cluster-scoped init script
Follow the steps below to create a cluster-scoped init script that installs the correct version of the library. Replace
<library-name> in the examples with the filename of the library to install.
If the init script does not already exist, create a base directory to store it:
Create the following script:
dbutils.fs.put("/databricks/init/cluster-name/<library-name>.sh",""" #!/bin/bash sudo apt-get -y install python3-pip sudo pip3 install <library-name> """, True)
Confirm that the script exists:
Go to the cluster configuration page and click the Advanced Options toggle.
At the bottom of the page, click the Init Scripts tab:
In the Destination drop-down, select DBFS, provide the file path to the script, and click Add.
Restart the cluster.