替换默认库 jarReplace a default library jar

Azure Databricks 包括多个默认的 Java 和 Scala 库。Azure Databricks includes a number of default Java and Scala libraries. 您可以使用群集范围内的初始化脚本将这些库替换为其他版本,以删除默认库 jar,然后安装所需的版本。You can replace any of these libraries with another version by using a cluster-scoped init script to remove the default library jar and then install the version you require.

重要

删除默认库和安装新版本可能会导致不稳定或完全中断 Azure Databricks 群集。Removing default libraries and installing new versions may cause instability or completely break your Azure Databricks cluster. 在运行生产作业之前,你应该在你的环境中全面测试所有新的库版本。You should thoroughly test any new library version in your environment before running production jobs.

标识项目 idIdentify the artifact id

若要确定要删除的 jar 文件的名称,请执行以下操作:To identify the name of the jar file you want to remove:

  1. 受支持版本的列表中,单击要使用的 Databricks Runtime 版本。Click the Databricks Runtime version you are using from the list of supported releases.
  2. 导航到 " Java 和 Scala 库" 部分。Navigate to the Java and Scala libraries section.
  3. 确定要删除的库的项目 IDIdentify the Artifact ID for the library you want to remove.

使用项目 id 查找 jar 文件名Use the artifact id to find the jar filename

使用 ls -l 笔记本中的命令查找包含项目 id 的 jar。例如,若要查找 spark-snowflake_2.12 Databricks Runtime 7.0 中项目 id 的 jar 文件名,可以使用以下代码:Use the ls -l command in a notebook to find the jar that contains the artifact id. For example, to find the jar filename for the spark-snowflake_2.12 artifact id in Databricks Runtime 7.0 you can use the following code:

%sh
ls -l /databricks/jars/*spark-snowflake_2.12*

这会返回 jar 文件名This returns the jar filename

`----workspace_spark_3_0--maven-trees--hive-2.3__hadoop-2.7--net.snowflake--spark-snowflake_2.12--net.snowflake__spark-snowflake_2.12__2.5.9-spark_2.4.jar`.

上传替换的 jar 文件Upload the replacement jar file

将替换的 jar 文件上传到 DBFS 路径。Upload your replacement jar file to a DBFS path.

创建初始化脚本Create the init script

使用以下模板创建群集范围内的初始化脚本Use the following template to create a cluster-scoped init script.

#!/bin/bash
rm -rf /databricks/jars/<jar_filename_to_remove>.jar
cp /dbfs/<path_to_replacement_jar>/<replacement_jar_filename>.jar /databricks/jars/

使用 spark-snowflake_2.12 上一步中的示例将生成类似于下面的初始化脚本:Using the spark-snowflake_2.12 example from the prior step would result in an init script similar to the following:

#!/bin/bash
rm -rf /databricks/jars/----workspace_spark_3_0--maven-trees--hive-2.3__hadoop-2.7--net.snowflake--spark-snowflake_2.12--net.snowflake__spark-snowflake_2.12__2.5.9-spark_2.4.jar
cp /dbfs/FileStore/jars/e43fe9db_c48d_412b_b142_cdde10250800-spark_snowflake_2_11_2_7_1_spark_2_4-b2adc.jar /databricks/jars/

安装 init 脚本并重新启动Install the init script and restart

  1. 按照配置群集范围内的初始化脚本中的说明,在群集上安装群集范围内的初始化脚本。Install the cluster-scoped init script on the cluster, following the instructions in Configure a cluster-scoped init script.
  2. 重新启动群集。Restart the cluster.