Databricks CLIDatabricks CLI

Databricks 命令行界面 (CLI) 提供了针对 Azure Databricks 平台的易用界面。The Databricks command-line interface (CLI) provides an easy-to-use interface to the Azure Databricks platform. 此开放源代码项目承载在 GitHub 上。The open source project is hosted on GitHub. 此 CLI 在 Databricks REST API 2.0 的基础上构建,根据工作区 API群集 API实例池 APIDBFS API组 API作业 API库 API机密 API 整理到命令组中:workspaceclustersinstance-poolsfsgroupsjobsrunslibrariessecretsThe CLI is built on top of the Databricks REST API 2.0 and is organized into command groups based on the Workspace API, Clusters API, Instance Pools API, DBFS API, Groups API, Jobs API, Libraries API, and Secrets API: workspace, clusters, instance-pools, fs, groups, jobs, runs, libraries, and secrets.

重要

我们正积极开发此 CLI,将以试验客户端的形式发布它。This CLI is under active development and is released as an Experimental client. 这意味着,相关界面仍可能会变化。This means that interfaces are still subject to change.

设置 CLISet up the CLI

此部分列出了 CLI 的要求,还介绍了如何安装和配置用于运行 CLI 的环境。This section lists CLI requirements and describes how to install and configure your environment to run the CLI.

要求Requirements

  • Python 3 - 3.6 及更高版本Python 3 - 3.6 and above

  • Python 2 - 2.7.9 及更高版本Python 2 - 2.7.9 and above

    重要

    在 MacOS 上,默认的 Python 2 安装未实现 TLSv1_2 协议。将 CLI 与此 Python 安装一起运行会导致以下错误:AttributeError: 'module' object has no attribute 'PROTOCOL_TLSv1_2'On MacOS, the default Python 2 installation does not implement the TLSv1_2 protocol and running the CLI with this Python installation results in the error: AttributeError: 'module' object has no attribute 'PROTOCOL_TLSv1_2'. 使用 Homebrew 来安装具有 ssl.PROTOCOL_TLSv1_2 的 Python 版本。Use Homebrew to install a version of Python that has ssl.PROTOCOL_TLSv1_2.

限制Limitations

不支持将 Databricks CLI 用于启用了防火墙的存储容器。Using the Databricks CLI with firewall enabled storage containers is not supported. Databricks 建议使用 Databricks Connectaz storageDatabricks recommends you use Databricks Connect or az storage.

安装 CLIInstall the CLI

请使用与 Python 安装相对应的 pip 版本运行 pip install databricks-cliRun pip install databricks-cli using the appropriate version of pip for your Python installation.

设置身份验证 Set up authentication

在运行 CLI 命令之前,必须设置身份验证。Before you can run CLI commands, you must set up authentication. 若要向 CLI 进行身份验证,可使用 Databricks 个人访问令牌Azure Active Directory (Azure AD) 令牌To authenticate to the CLI you can use a Databricks personal access token or an Azure Active Directory (Azure AD) token.

使用 Azure AD 令牌设置身份验证Set up authentication using an Azure AD token

若要使用 Azure AD 令牌配置 CLI,请生成 Azure AD 令牌并将它存储在环境变量 DATABRICKS_AAD_TOKEN 中。To configure the CLI using an Azure AD token, generate the Azure AD token and store it in the environment variable DATABRICKS_AAD_TOKEN.

export DATABRICKS_AAD_TOKEN=<azure-ad-token>

运行 databricks configure --aad-tokenRun databricks configure --aad-token. 此命令发出提示:The command issues the prompt:

Databricks Host (should begin with https://):

输入每工作区 URL(格式为 adb-<workspace-id>.<random-number>.azuredatabricks.net)。若要获取每工作区 URL,请参阅每工作区 URLEnter your per-workspace URL, with the format adb-<workspace-id>.<random-number>.azuredatabricks.net To get the per-workspace URL, see Per-workspace URL.

按提示操作后,访问凭据会存储在 ~/.databrickscfg 文件中。After you complete the prompt, your access credentials are stored in the file ~/.databrickscfg. 此文件应包含如下所示条目:The file should contain entries like:

host = https://<databricks-instance>
token =  <azure-ad-token>

使用 Databricks 个人访问令牌设置身份验证Set up authentication using a Databricks personal access token

若要将 CLI 配置为使用个人访问令牌,请运行 databricks configure --tokenTo configure the CLI to use the personal access token, run databricks configure --token. 此命令发出提示:The command issues the prompts:

Databricks Host (should begin with https://):
Token:

完成提示后,访问凭据会存储在 ~/.databrickscfg 文件中。After you complete the prompts, your access credentials are stored in the file ~/.databrickscfg. 此文件应包含如下所示条目:The file should contain entries like:

host = https://<databricks-instance>
token =  <personal-access-token>

对于 CLI 0.8.1 及更高版本,可以通过设置环境变量 DATABRICKS_CONFIG_FILE 来更改此文件的路径。For CLI 0.8.1 and above, you can change the path of this file by setting the environment variable DATABRICKS_CONFIG_FILE.

重要

由于 CLI 在 REST API 基础上构建,因此 .netrc 文件中的身份验证配置优先于 .databrickscfg 中的配置。Because the CLI is built on top of the REST API, your authentication configuration in your .netrc file takes precedence over your configuration in .databrickscfg.

CLI 0.8.0 及更高版本支持以下环境变量:CLI 0.8.0 and above supports the following environment variables:

  • DATABRICKS_HOST
  • DATABRICKS_TOKEN

环境变量设置优先于配置文件中的设置。An environment variable setting takes precedence over the setting in the configuration file.

连接配置文件 Connection profiles

Databricks CLI 配置支持多个连接配置文件。The Databricks CLI configuration supports multiple connection profiles. 同一 Databricks CLI 安装可以用来在多个 Azure Databricks 工作区进行 API 调用。The same installation of Databricks CLI can be used to make API calls on multiple Azure Databricks workspaces.

若要添加连接配置文件,请执行以下命令:To add a connection profile:

databricks configure [--profile <profile>]

若要使用连接配置文件,请执行以下命令:To use the connection profile:

databricks workspace ls --profile <profile>

Alias 命令组 Alias command groups

有时候,使用命令组的名称作为每个 CLI 调用的前缀并不方便,例如 databricks workspace lsSometimes it can be inconvenient to prefix each CLI invocation with the name of a command group, for example databricks workspace ls. 若要使 CLI 更易于使用,可以通过 alias 命令组来使用较短的命令。To make the CLI easier to use, you can alias command groups to shorter commands. 例如,若要在 Bourne again shell 中将 databricks workspace ls 缩写为 dw ls,可以将 alias dw="databricks workspace" 添加到相应的 bash 配置文件。For example to shorten databricks workspace ls to dw ls in the Bourne again shell, you can add alias dw="databricks workspace" to the appropriate bash profile. 通常,该文件位于 ~/.bash_profileTypically, this file is located at ~/.bash_profile.

提示

Azure Databricks 已将 databricks fs 的别名设置为 dbfsdatabricks fs lsdbfs ls 等效。Azure Databricks has already aliased databricks fs to dbfs; databricks fs ls and dbfs ls are equivalent.

使用 CLIUse the CLI

此部分介绍如何获取 CLI 帮助、如何分析 CLI 输出,以及如何调用每个命令组中的命令。This section shows you how to get CLI help, parse CLI output, and invoke commands in each command group.

显示 CLI 命令组帮助Display CLI command group help

可以通过运行 databricks <group> -h 列出任意命令组的子命令。You list the subcommands for any command group by running databricks <group> -h. 例如,可以通过运行 databricks fs -h 列出 DBFS CLI 子命令。For example, you list the DBFS CLI subcommands by running databricks fs -h.

使用 jq 分析 CLI 输出 Use jq to parse CLI output

某些 Databricks CLI 命令从 API 终结点输出 JSON 响应。Some Databricks CLI commands output the JSON response from the API endpoint. 有时候,可以分析将要通过管道传输到其他命令中的 JSON 部件。Sometimes it can be useful to parse out parts of the JSON to pipe into other commands. 例如,若要复制作业定义,必须获取 /api/2.0/jobs/getsettings 字段并将其用作 databricks jobs create 命令的参数。For example, to copy a job definition, you must take the settings field of /api/2.0/jobs/get and use that as an argument to the databricks jobs create command.

在这些情况下,建议使用实用程序 jqIn these cases, we recommend you to use the utility jq. 可以将 Homebrew 与 brew install jq 配合使用,以便在 MacOS 上安装 jqYou can install jq on MacOS using Homebrew with brew install jq.

有关 jq 的详细信息,请参阅 jq 手册For more information on jq, see the jq Manual.

JSON 字符串参数JSON string parameters

字符串参数的处理方式各异,具体取决于你的操作系统:String parameters are handled differently depending on your operating system:

  • Unix:必须将 JSON 字符串参数用单引号引起来。Unix: You must enclose JSON string parameters in single quotes. 例如:For example:

    databricks jobs run-now --job-id 9 --jar-params '["20180505", "alantest"]'
    
  • Windows:必须将 JSON 字符串参数用双引号引起来,字符串内的引号字符必须在 \ 之后。Windows: You must enclose JSON string parameters in double quotes, and the quote characters inside the string must be preceded by \. 例如:For example:

    databricks jobs run-now --job-id 9 --jar-params "[\"20180505\", \"alantest\"]"
    

CLI 命令CLI commands