Stack CLIStack CLI

重要

此功能以 Beta 版本提供。This feature is in Beta.

备注

堆栈 CLI 需要 Databricks CLI 0.8.3 或更高版本。The stack CLI requires Databricks CLI 0.8.3 or above.

堆栈 CLI 提供一种管理 Azure Databricks 资源(例如作业、笔记本和 DBFS 文件)堆栈的方法。The stack CLI provides a way to manage a stack of Azure Databricks resources, such as jobs, notebooks, and DBFS files. 可以在本地存储笔记本和 DBFS 文件,并创建一个堆栈配置 JSON 模板,用于定义从本地文件到 Azure Databricks 工作区路径的映射,以及运行笔记本的作业的配置。You can store notebooks and DBFS files locally and create a stack configuration JSON template that defines mappings from your local files to paths in your Azure Databricks workspace, along with configurations of jobs that run the notebooks.

结合使用堆栈 CLI 和堆栈配置 JSON 模板来部署和管理堆栈。Use the stack CLI with the stack configuration JSON template to deploy and manage your stack.

可以通过将 Databricks 堆栈 CLI 子命令附加到 databricks stack 来运行它们。You run Databricks stack CLI subcommands by appending them to databricks stack.

databricks stack --help
Usage: databricks stack [OPTIONS] COMMAND [ARGS]...

  [Beta] Utility to deploy and download Databricks resource stacks.

Options:
  -v, --version   [VERSION]
  --debug         Debug Mode. Shows full stack trace on error.
  --profile TEXT  CLI connection profile to use. The default profile is
                  "DEFAULT".
  -h, --help      Show this message and exit.

Commands:
  deploy    Deploy a stack of resources given a JSON configuration of the stack
    Usage: databricks stack deploy [OPTIONS] CONFIG_PATH
    Options:
       -o, --overwrite  Include to overwrite existing workspace notebooks and DBFS
                        files  [default: False]
  download  Download workspace notebooks of a stack to the local filesystem
            given a JSON stack configuration template.
    Usage: databricks stack download [OPTIONS] CONFIG_PATH
    Options:
       -o, --overwrite  Include to overwrite existing workspace notebooks in the
                        local filesystem   [default: False]

将堆栈部署到工作区Deploy a stack to a workspace

此子命令部署堆栈。This subcommand deploys a stack. 请参阅堆栈设置,了解如何设置堆栈。See Stack setup to learn how to set up a stack.

databricks stack deploy ./config.json

堆栈配置 JSON 模板提供 config.json 的示例。Stack configuration JSON template gives an example of config.json.

下载堆栈笔记本更改Download stack notebook changes

此子命令下载堆栈的笔记本。This subcommand downloads the notebooks of a stack.

databricks stack download ./config.json

示例Examples

堆栈设置 Stack setup

示例堆栈的文件结构File structure of an example stack

tree
.
├── notebooks
|   ├── common
|   |   └── notebook.scala
|   └── config
|       ├── environment.scala
|       └── setup.sql
├── lib
|   └── library.jar
└── config.json

此示例堆栈在 notebooks/common/notebook.scala 中包含一个主笔记本,在 notebooks/config 文件夹中包含配置笔记本。This example stack contains a main notebook in notebooks/common/notebook.scala along with configuration notebooks in the notebooks/config folder. lib/library.jar 中有堆栈的 JAR 库依赖项。There is a JAR library dependency of the stack in lib/library.jar. config.json 是堆栈的堆栈配置 JSON 模板。config.json is the stack configuration JSON template of the stack. 这就是传递给堆栈 CLI 的用于部署堆栈的内容。This is what is passed into the stack CLI for deployment of the stack.

堆栈配置 JSON 模板 Stack configuration JSON template

堆栈配置模板描述堆栈配置。The stack configuration template describes the stack configuration.

cat config.json
{
  "name": "example-stack",
  "resources": [
  {
    "id": "example-workspace-notebook",
    "service": "workspace",
    "properties": {
      "source_path": "notebooks/common/notebook.scala",
      "path": "/Users/example@example.com/dev/notebook",
      "object_type": "NOTEBOOK"
    }
  },
  {
    "id": "example-workspace-config-dir",
    "service": "workspace",
    "properties": {
      "source_path": "notebooks/config",
      "path": "/Users/example@example.com/dev/config",
      "object_type": "DIRECTORY"
    }
  },
  {
    "id": "example-dbfs-library",
    "service": "dbfs",
    "properties": {
      "source_path": "lib/library.jar",
      "path": "dbfs:/tmp/lib/library.jar",
      "is_dir": false
    }
  },
    {
      "id": "example-job",
      "service": "jobs",
      "properties": {
        "name": "Example Stack CLI Job",
        "new_cluster": {
          "spark_version": "7.3.x-scala2.12",
          "node_type_id": "Standard_DS3_v2",
          "num_workers": 3
        },
        "timeout_seconds": 7200,
        "max_retries": 1,
        "notebook_task": {
          "notebook_path": "/Users/example@example.com/dev/notebook"
        },
        "libraries": [
          {
            "jar": "dbfs:/tmp/lib/library.jar"
          }
        ]
      }
    }
  ]
}

每个作业、工作区笔记本、工作区目录、DBFS 文件或 DBFS 目录都定义为 ResourceConfigEach job, workspace notebook, workspace directory, DBFS file, or DBFS directory is defined as a ResourceConfig. 代表工作区或 DBFS 资产的每个 ResourceConfig 都包含一个从本地文件或目录 (source_path) 到其在工作区中或 DBFS 中的位置 (path) 的映射。Each ResourceConfig that represent a workspace or DBFS asset contains a mapping from the file or directory where it exists locally (source_path) to where it would exist in the workspace or DBFS (path).

堆栈配置模板架构概述了堆栈配置模板的架构。Stack configuration template schema outlines the schema for the stack configuration template.

部署堆栈 Deploy a stack

使用 databricks stack deploy <configuration-file> 命令部署堆栈。You deploy a stack using the databricks stack deploy <configuration-file> command.

databricks stack deploy ./config.json

在堆栈部署过程中,会将 DBFS 和工作区资产上传到 Azure Databricks 工作区,并创建作业。During stack deployment, the DBFS and workspace assets are uploaded to your Azure Databricks workspace and jobs are created.

在堆栈部署时,用于部署的 StackStatus JSON 文件与名称相同的堆栈配置模板保存在同一目录中,并在 .json 扩展名前添加 deployed:(例如 ./config.deployed.json)。At stack deploy time, a StackStatus JSON file for the deployment is saved in the same directory as the stack configuration template with the name, adding deployed immediately before the .json extension: (for example, ./config.deployed.json). 堆栈 CLI 使用此文件来跟踪工作区上以前部署的资源。This file is used by the Stack CLI to keep track of past deployed resources on your workspace.

堆栈状态架构概述了堆栈配置的架构。Stack status schema outlines the schema of a stack configuration.

重要

不要尝试编辑或移动堆栈状态文件。Do not attempt to edit or move the stack status file. 如果收到有关堆栈状态文件的任何错误,请删除该文件,然后尝试重新部署。If you get any errors regarding the stack status file, delete the file and try the deployment again.

./config.deployed.json
{
  "cli_version": "0.8.3",
  "deployed_output": [
    {
      "id": "example-workspace-notebook",
      "databricks_id": {
        "path": "/Users/example@example.com/dev/notebook"
      },
      "service": "workspace"
    },
    {
      "id": "example-workspace-config-dir",
      "databricks_id": {
        "path": "/Users/example@example.com/dev/config"
      },
      "service": "workspace"
    },
    {
      "id": "example-dbfs-library",
      "databricks_id": {
        "path": "dbfs:/tmp/lib/library.jar"
      },
      "service": "dbfs"
    },
    {
      "id": "example-job",
      "databricks_id": {
        "job_id": 123456
      },
      "service": "jobs"
    }
  ],
  "name": "example-stack"
}

数据结构Data structures

本节内容:In this section:

堆栈配置模板架构 Stack configuration template schema

StackConfigStackConfig

这些是堆栈配置模板的外围字段。These are the outer fields of a stack configuration template. 所有字段都是必填字段。All fields are required.

字段名称Field Name 类型Type 说明Description
namename STRING 堆栈的名称。The name of the stack.
resourcesresources ResourceConfig 列表List of ResourceConfig Azure Databricks 中的资产。An asset in Azure Databricks. 资源与三个服务(REST API 命名空间)相关:工作区、作业和 dbfs。Resources are related to three services (REST API namespaces): workspace, jobs, and dbfs.

ResourceConfigResourceConfig

每个 ResourceConfig 的字段。The fields for each ResourceConfig. 所有字段都是必填字段。All fields are required.

字段名称Field Name 类型Type 说明Description
idid STRING 资源的唯一 ID。A unique ID for the resource. 强制执行 ResourceConfig 的唯一性。Uniqueness of ResourceConfig is enforced.
服务service ResourceServiceResourceService 资源运行时所在的 REST API 服务。The REST API service that the resource operates on. jobsOne of: jobs,
workspacedbfs 中的一项。workspace, or dbfs.
propertiesproperties ResourcePropertiesResourceProperties 其中的字段根据 ResourceConfig 服务而异。Fields in this are different depending the the ResourceConfig service.

ResourcePropertiesResourceProperties

ResourceService 提供的资源属性。The properties of a resource by ResourceService. 这些字段被分类为 Azure Databricks REST API 中已使用或未使用的字段。The fields are classified as those used or not used in an Azure Databricks REST API. 列出的所有字段都是必需的。All the fields listed are required.

服务service 堆栈 CLI 中使用的 REST API 中的字段Fields from the REST API used in the Stack CLI 仅在堆栈 CLI 中使用的字段Fields used only in the Stack CLI
工作区workspace path:STRING- 笔记本或目录的远程工作区路径。path: STRING- Remote workspace paths of notebooks or directories. (例如(Ex. /Users/example@example.com/notebook)/Users/example@example.com/notebook)

object_type:ObjectType- 笔记本对象类型。object_type: ObjectType- Notebook object type. 只能是 NOTEBOOKDIRECTORYCan only be NOTEBOOK or DIRECTORY.
source_path:STRING- 工作区笔记本或目录的本地源路径。source_path: STRING- Local source path of Workspace notebooks or directories. 堆栈配置模板文件的相对路径或文件系统中的绝对路径。A relative path to the stack configuration template file or an absolute path in your filesystem.
jobsjobs JobSettings 中的任何字段。Any field in JobSettings. 唯一一个 JobSettings 中不需要,但堆栈 CLI 必需的字段:The only field not required in JobSettings but required for the stack CLI is:

name:STRING- 要部署的作业的名称。name: STRING- Name of the job to be deployed. 为了不创建太多重复的作业,堆栈 CLI 在堆栈部署的作业中强制执行唯一名称。For purposes of not creating too many duplicate jobs, the Stack CLI enforces unique names in stack deployed jobs.
无。None.
dbfsdbfs path:STRING- 匹配的远程 DBFS 路径。path: STRING- Matching remote DBFS path. 必须以 dbfs:/ 开头。Must start with dbfs:/. (例如:(ex. dbfs:/this/is/a/sample/path)dbfs:/this/is/a/sample/path)

is_dir:BOOL- DBFS 路径是目录还是文件。is_dir: BOOL- Whether a DBFS path is a directory or a file.
source_path:STRING- DBFS 文件或目录的本地源路径。source_path: STRING- Local source path of DBFS files or directories. 堆栈配置模板文件的相对路径或文件系统中的绝对路径。A relative path to the stack config template file or an absolute path in your filesystem.

ResourceServiceResourceService

每个资源都属于与 Databricks REST API 相关的特定服务。Each resource belongs to a specific service that aligns with the Databricks REST API. 这些是堆栈 CLI 支持的服务。These are the services that are supported by the Stack CLI.

服务Service 说明Description
工作区workspace 工作区笔记本或目录。A workspace notebook or directory.
jobsjobs Azure Databricks 作业。An Azure Databricks job.
dbfsdbfs DBFS 文件或目录。A DBFS file or directory.

堆栈状态架构 Stack status schema

StackStatusStackStatus

堆栈状态文件是在使用 CLI 部署堆栈之后创建的。A stack status file is created after a stack is deployed using the CLI. 顶级字段包括:The top-level fields are:

字段名称Field Name 类型Type 说明Description
namename STRING 堆栈的名称。The name of the stack. 此字段与 StackConfig 中的字段相同。This field is the same field as in StackConfig.
cli_versioncli_version STRING 用于部署堆栈的 Databricks CLI 的版本。The version of the Databricks CLI used to deploy the stack.
deployed_resourcesdeployed_resources ResourceStatus 列表List of ResourceStatus 每个已部署资源的状态。The status of each deployed resource. 对于在 StackConfig 中定义的每个资源,此处都将生成相应的 ResourceStatusFor each resource defined in StackConfig, a corresponding ResourceStatus is generated here.

ResourceStatusResourceStatus

字段名称Field Name 类型Type 说明Description
idid STRING 资源的堆栈唯一 ID。A stack-unique ID for the resource.
服务service ResourceServiceResourceService 资源运行时所在的 REST API 服务。The REST API service that the resource operates on. jobsOne of: jobs,
workspacedbfs 中的一项。workspace, or dbfs.
databricks_iddatabricks_id DatabricksIdDatabricksId 已部署资源的物理 ID。The physical ID of the deployed resource. 实际架构取决于资源的类型(服务)。The actual schema depends on the type (service) of the resource.

DatabricksIdDatabricksId

一个 JSON 对象,其字段取决于服务。A JSON object whose field depends on the service.

服务Service JSON 中的字段Field in JSON 类型Type 描述Description
工作区workspace pathpath STRINGSTRING Azure Databricks 工作区中笔记本或目录的绝对路径。The absolute path of the notebook or directory in an Azure Databricks workspace. 命名与工作区 API 一致。Naming is consistent with the Workspace API.
jobsjobs job_idjob_id STRINGSTRING 作业 ID,如 Azure Databricks 工作区中所示。The job ID as shown in an Azure Databricks workspace. 可用于更新已部署的作业。This can be used to update jobs already deployed.
dbfsdbfs pathpath STRINGSTRING Azure Databricks 工作区中笔记本或目录的绝对路径。The absolute path of the notebook or directory in an Azure Databricks workspace. 命名与 DBFS API 一致。Naming is consistent with the DBFS API.