Problem: Error When Installing
pyodbc on a Cluster
One of the following errors occurs when you use
pip to install the
java.lang.RuntimeException: Installation failed with message: Collecting pyodbc
"Library installation is failing due to missing dependencies. sasl and thrift_sasl are optional dependencies for SASL or Kerberos support"
thrift_sasl are optional dependencies for SASL or Kerberos support, they need to be present for
pyodbc installation to succeed.
Set up solution in a single notebook
In the notebook, check the version of
thriftand upgrade to the latest version.
%sh pip list | egrep 'thrift-sasl|sasl' pip install --upgrade thrift
Ensure that dependent packages are installed.
%sh dpkg -l | egrep 'thrift_sasl|libsasl2-dev|gcc|python-dev'
%sh sudo apt-get -y install unixodbc-dev libsasl2-dev gcc python-dev
Set up solution as a cluster-scoped init script
You can put these commands into a single init script and attach it to the cluster. This ensures that the dependent libraries for
pyodbc are installed before the cluster starts.
Create the base directory to store the init script in, if the base directory does not exist. Here, use
dbfs:/databricks/<directory>as an example.
Create the script and save it to a file.
dbutils.fs.put("dbfs:/databricks/<directory>/tornado.sh",""" #!/bin/bash pip list | egrep 'thrift-sasl|sasl' pip install --upgrade thrift dpkg -l | egrep 'thrift_sasl|libsasl2-dev|gcc|python-dev' sudo apt-get -y install unixodbc-dev libsasl2-dev gcc python-dev """,True)
Check that the script exists.
On the cluster configuration page, click the Advanced Options toggle.
At the bottom of the page, click the Init Scripts tab.
In the Destination drop-down, select DBFS, provide the file path to the script, and click Add.
Restart the cluster
For more details about cluster-scoped init scripts, see Cluster-scoped init scripts.