库失败并出现依赖项异常Libraries fail with dependency exception

问题Problem

你有一个 Python 函数,该函数在自定义的蛋状物或滚轮文件中定义,并且还具有在群集上安装的其他客户包满足的依赖项。You have a Python function that is defined in a custom egg or wheel file and also has dependencies that are satisfied by another customer package installed on the cluster.

调用此函数时,它将返回一个错误,指出无法满足要求。When you call this function, it returns an error that says the requirement cannot be satisfied.

org.apache.spark.SparkException: Process List(/local_disk0/pythonVirtualEnvDirs/virtualEnv-d82b31df-1da3-4ee9-864d-8d1fce09c09b/bin/python, /local_disk0/pythonVirtualEnvDirs/virtualEnv-d82b31df-1da3-4ee9-864d-8d1fce09c09b/bin/pip, install, fractal==0.1.0, --disable-pip-version-check) exited with code 1. Could not find a version that satisfies the requirement fractal==0.1.0 (from versions: 0.1.1, 0.1.2, 0.2.1, 0.2.2, 0.2.3, 0.2.4, 0.2.5, 0.2.6, 0.2.7, 0.2.8, 0.2.9, 0.3.0)

例如,假设你同时安装了滑轮和滑轮 B,无论是通过 UI 还是通过笔记本范围的库安装到群集。As an example, imagine that you have both wheel A and wheel B installed, either to the cluster via the UI or via notebook-scoped libraries. 假定滑轮 A 依赖于滑轮 B。Assume that wheel A has a dependency on wheel B.

  • dbutils.library.install(/path_to_wheel/A.whl)
  • dbutils.library.install(/path_to_wheel/B.whl)

尝试使用这些库之一进行调用时,无法满足要求。When you try to make a call using one of these libraries, you get a requirement cannot be satisfied error.

原因Cause

即使已通过群集 UI 或通过笔记本范围的库安装来安装所需的依赖项,Azure Databricks 仍无法保证特定库在群集上的安装顺序。Even though the requirements have been met by installing the required dependencies via the cluster UI or via a notebook-scoped library installation, Azure Databricks cannot guarantee the order in which specific libraries are installed on the cluster. 如果正在引用库,但尚未将其分发到执行器节点,它将回退到 PyPI 并在本地使用以满足要求。If a library is being referenced and it has not been distributed to the executor nodes, it will fallback to PyPI and use it locally to satisfy the requirement.

解决方案Solution

应使用包含所有必需的代码和依赖项的一个蛋或滚轮文件。You should use one egg or wheel file that contains all required code and dependencies. 这可确保你的代码在运行时加载了正确的库并可供使用。This ensures that your code has the correct libraries loaded and available at run time.