作业 APIJobs API

利用作业 API,可以创建、编辑和删除作业。The Jobs API allows you to create, edit, and delete jobs. 对作业 API 的请求的最大允许大小为 10MB。The maximum allowed size of a request to the Jobs API is 10MB. 有关此 API 的操作指南,请参阅作业 API 示例See Jobs API examples for a how-to guide on this API.

备注

在发出作业 API 请求时,如果收到 500 级错误,建议最多重试请求 10 分钟(两次重试之间至少间隔 30 秒)。If you receive a 500-level error when making Jobs API requests, we recommend retrying requests for up to 10 min (with a minimum 30 second interval between retries).

重要

要访问 Databricks REST API,必须进行身份验证To access Databricks REST APIs, you must authenticate.

创建 Create

端点Endpoint HTTP 方法HTTP Method
2.0/jobs/create POST

创建新作业。Create a new job.

下面是一个作业的示例请求,该作业每天晚上 10:15 运行:An example request for a job that runs at 10:15pm each night:

{
  "name": "Nightly model training",
  "new_cluster": {
    "spark_version": "7.3.x-scala2.12",
    "node_type_id": "Standard_D3_v2",
    "num_workers": 10
  },
  "libraries": [
    {
      "jar": "dbfs:/my-jar.jar"
    },
    {
      "maven": {
        "coordinates": "org.jsoup:jsoup:1.7.2"
      }
    }
  ],
  "timeout_seconds": 3600,
  "max_retries": 1,
  "schedule": {
    "quartz_cron_expression": "0 15 22 ? * *",
    "timezone_id": "America/Los_Angeles"
  },
  "spark_jar_task": {
    "main_class_name": "com.databricks.ComputeModels"
  }
}

以及响应:And response:

{
  "job_id": 1
}

请求结构 Request structure

创建新作业。Create a new job.

重要

  • 在新作业群集上运行作业时,该作业会被视为遵从作业计算定价标准的作业计算(自动化)工作负载。When you run a job on a new jobs cluster, the job is treated as a Jobs Compute (automated) workload subject to Jobs Compute pricing.
  • 在现有的通用群集上运行作业时,该作业会被视为遵从通用计算定价标准的通用计算(交互式)工作负载。When you run a job on an existing all-purpose cluster, it is treated as an All-Purpose Compute (interactive) workload subject to All-Purpose Compute pricing.
字段名称Field Name 类型Type 描述Description
existing_cluster_id 或 new_clusterexisting_cluster_id OR new_cluster STRINGNewClusterSTRING OR NewCluster 如果是 existing_cluster_id,则该项是将用于此作业的所有运行的现有群集的 ID。If existing_cluster_id, the ID of an existing cluster that will be used for all runs of this job. 在现有群集上运行作业时,如果该群集停止响应,则可能需要手动重启该群集。When running jobs on an existing cluster, you may need to manually restart the cluster if it stops responding. 建议在新群集上运行作业,以获得更高的可靠性。We suggest running jobs on new clusters for greater reliability.

如果是 new_cluster,则该项是将为每个运行创建的群集的说明。If new_cluster, a description of a cluster that will be created for each run.
notebook_task 或 spark_jar_task 或 spark_python_task 或 spark_submit_tasknotebook_task OR spark_jar_task OR spark_python_task OR spark_submit_task NotebookTaskSparkJarTaskSparkPythonTaskSparkSubmitTaskNotebookTask OR SparkJarTask OR SparkPythonTask OR SparkSubmitTask 如果是 notebook_task,则表明此作业应该运行笔记本。If notebook_task, indicates that this job should run a notebook. 此字段不能与 spark_jar_task 一起指定。This field may not be specified in conjunction with spark_jar_task.

如果是 spark_jar_task,则表明此作业应该运行 JAR。If spark_jar_task, indicates that this job should run a JAR.

如果是 spark_python_task,则表明此作业应该运行 Python 文件。If spark_python_task, indicates that this job should run a Python file.

如果是 spark_submit_task,则表明此作业应该运行 spark-submit 脚本。If spark_submit_task, indicates that this job should run spark submit script.
namename STRING 可选的作业名称。An optional name for the job. 默认值是 UntitledThe default value is Untitled.
libraries 一个由构成的数组An array of Library 可选的库列表,这些库会安装在将要执行该作业的群集上。An optional list of libraries to be installed on the cluster that will execute the job. 默认值为空列表。The default value is an empty list.
email_notificationsemail_notifications JobEmailNotificationsJobEmailNotifications 一组可选的电子邮件地址,在此作业的运行开始和完成时,以及在删除此作业时,这些地址会收到通知。An optional set of email addresses notified when runs of this job begin and complete and when this job is deleted. 默认行为是不发送任何电子邮件。The default behavior is to not send any emails.
timeout_secondstimeout_seconds INT32 可选的超时设置,应用于此作业的每个运行。An optional timeout applied to each run of this job. 默认行为是没有任何超时。The default behavior is to have no timeout.
max_retriesmax_retries INT32 对未成功的运行进行重试的最大次数,可选。An optional maximum number of times to retry an unsuccessful run. 如果运行在完成时其以下状态为 FAILED,则会被视为未成功:result_state 或A run is considered to be unsuccessful if it completes with the FAILED result_state or
INTERNAL_ERROR
life_cycle_state.life_cycle_state. 值 -1 表示要无限次重试,而值 0 则表示从不重试。The value -1 means to retry indefinitely and the value 0 means to never retry. 默认行为是从不重试。The default behavior is to never retry.
min_retry_interval_millismin_retry_interval_millis INT32 可选的最小间隔(失败运行的开始时间与随后的重试运行开始时间之间的间隔),以毫秒为单位。An optional minimal interval in milliseconds between the start of the failed run and the subsequent retry run. 默认行为是立即重试未成功的运行。The default behavior is that unsuccessful runs are immediately retried.
retry_on_timeoutretry_on_timeout BOOL 一个可选的策略,用于指定在作业超时是否重试该作业。默认行为是在超时发生时不重试。An optional policy to specify whether to retry a job when it times out. The default behavior is to not retry on timeout.
scheduleschedule CronScheduleCronSchedule 此作业的可选定期计划。An optional periodic schedule for this job. 默认行为是,在通过单击作业 UI 中的“立即运行”或向 runNow 发送 API 请求来触发作业时,该作业会运行。The default behavior is that the job runs when triggered by clicking Run Now in the Jobs UI or sending an API request to runNow.
max_concurrent_runsmax_concurrent_runs INT32 允许的作业最大并发运行数,可选。An optional maximum allowed number of concurrent runs of the job.

如果希望能够以并发方式执行同一作业的多个运行,请设置此值。Set this value if you want to be able to execute multiple runs of the same job concurrently. 设置此值适用于这样的情形:例如,如果你按计划频繁触发作业并希望允许连续的运行彼此重叠,或者,如果你希望触发多个在输入参数方面有区别的运行。This is useful for example if you trigger your job on a frequent schedule and want to allow consecutive runs to overlap with each other, or if you want to trigger multiple runs which differ by their input parameters.

此设置只影响新的运行。This setting affects only new runs. 例如,假定作业的并发数为 4,并且存在 4 个并发的活动运行。For example, suppose the job’s concurrency is 4 and there are 4 concurrent active runs. 那么,将该并发数设置为 3 则不会终止任何活动的运行。Then setting the concurrency to 3 won’t kill any of the active runs. 但是,从此之后,除非活动的运行少于 3 个,否则会跳过新的运行。However, from then on, new runs are skipped unless there are fewer than 3 active runs.

此值不能超过 1000。This value cannot exceed 1000. 将此值设置为 0 会导致跳过所有新的运行。Setting this value to 0 causes all new runs to be skipped. 默认行为是只允许 1 个并发运行。The default behavior is to allow only 1 concurrent run.

响应结构 Response structure

字段名称Field Name 类型Type 描述Description
job_idjob_id INT64 新创建的作业的规范标识符。The canonical identifier for the newly created job.

列表 List

端点Endpoint HTTP 方法HTTP Method
2.0/jobs/list GET

列出所有作业。List all jobs. 示例响应:An example response:

{
  "jobs": [
    {
      "job_id": 1,
      "settings": {
        "name": "Nightly model training",
        "new_cluster": {
          "spark_version": "7.3.x-scala2.12",
          "node_type_id": "Standard_D3_v2",
          "num_workers": 10
        },
        "libraries": [
          {
            "jar": "dbfs:/my-jar.jar"
          },
          {
            "maven": {
              "coordinates": "org.jsoup:jsoup:1.7.2"
            }
          }
        ],
        "timeout_seconds": 100000000,
        "max_retries": 1,
        "schedule": {
          "quartz_cron_expression": "0 15 22 ? * *",
          "timezone_id": "America/Los_Angeles",
          "pause_status": "UNPAUSED"
        },
        "spark_jar_task": {
          "main_class_name": "com.databricks.ComputeModels"
        }
      },
      "created_time": 1457570074236
    }
  ]
}

响应结构 Response structure

字段名称Field Name 类型Type 描述Description
jobsjobs 一个由作业构成的数组An array of Job 作业的列表。The list of jobs.

删除 Delete

端点Endpoint HTTP 方法HTTP Method
2.0/jobs/delete POST

删除该作业并向 JobSettings.email_notifications 中指定的地址发送电子邮件。Delete the job and send an email to the addresses specified in JobSettings.email_notifications. 如果已删除该作业,则不会执行任何操作。No action occurs if the job has already been removed. 在该作业被删除之后,通过作业 UI 或 API 均无法查看其详细信息或其运行历史记录。After the job is removed, neither its details or its run history is visible via the Jobs UI or API. 在此请求完成时,该作业一定会被删除。The job is guaranteed to be removed upon completion of this request. 但是,在接收此请求之前处于活动状态的运行可能仍会处于活动状态。However, runs that were active before the receipt of this request may still be active. 它们将会被异步终止。They will be terminated asynchronously.

示例请求:An example request:

{
  "job_id": 1
}

请求结构 Request structure

删除某个作业并向 JobSettings.email_notifications 中指定的地址发送电子邮件。Delete a job and send an email to the addresses specified in JobSettings.email_notifications. 如果已删除该作业,则不会执行任何操作。No action occurs if the job has already been removed. 在该作业被删除之后,通过作业 UI 或 API 均无法查看其详细信息或其运行历史记录。After the job is removed, neither its details nor its run history is visible via the Jobs UI or API. 在此请求完成时,该作业一定会被删除。The job is guaranteed to be removed upon completion of this request. 但是,在接收此请求之前处于活动状态的运行可能仍会处于活动状态。However, runs that were active before the receipt of this request may still be active. 它们将会被异步终止。They will be terminated asynchronously.

字段名称Field Name 类型Type 描述Description
job_idjob_id INT64 要删除的作业的规范标识符。The canonical identifier of the job to delete. 此字段为必需字段。This field is required.

获取 Get

端点Endpoint HTTP 方法HTTP Method
2.0/jobs/get GET

检索有关某一个作业的信息。Retrieves information about a single job. 示例请求:An example request:

/jobs/get?job_id=1

示例响应:An example response:

{
  "job_id": 1,
  "settings": {
    "name": "Nightly model training",
    "new_cluster": {
      "spark_version": "7.3.x-scala2.12",
      "node_type_id": "Standard_D3_v2",
      "num_workers": 10
    },
    "libraries": [
      {
        "jar": "dbfs:/my-jar.jar"
      },
      {
        "maven": {
          "coordinates": "org.jsoup:jsoup:1.7.2"
        }
      }
    ],
    "timeout_seconds": 100000000,
    "max_retries": 1,
    "schedule": {
      "quartz_cron_expression": "0 15 22 ? * *",
      "timezone_id": "America/Los_Angeles",
      "pause_status": "UNPAUSED"
    },
    "spark_jar_task": {
      "main_class_name": "com.databricks.ComputeModels"
    }
  },
  "created_time": 1457570074236
}

请求结构 Request structure

检索有关某一个作业的信息。Retrieve information about a single job.

字段名称Field Name 类型Type 描述Description
job_idjob_id INT64 要检索其相关信息的作业的规范标识符。The canonical identifier of the job to retrieve information about. 此字段为必需字段。This field is required.

响应结构 Response structure

字段名称Field Name 类型Type 描述Description
job_idjob_id INT64 此作业的规范标识符。The canonical identifier for this job.
creator_user_namecreator_user_name STRING 创建者用户名。The creator user name. 如果已将该用户删除,响应中将不会包含此字段。This field won’t be included in the response if the user has already been deleted.
设置settings JobSettingsJobSettings 此作业及其所有运行的设置。Settings for this job and all of its runs. 可以使用 resetJob 方法来更新这些设置。These settings can be updated using the resetJob method.
created_timecreated_time INT64 此作业的创建时间,以 epoch 毫秒表示(自 UTC 1970 年 1 月 1 日起的毫秒数)。The time at which this job was created in epoch milliseconds (milliseconds since 1/1/1970 UTC).

重置 Reset

端点Endpoint HTTP 方法HTTP Method
2.0/jobs/reset POST

覆盖作业设置。Overwrite job settings.

使作业 2 类似于作业 1(来自 create_job 示例)的示例请求:An example request that makes job 2 look like job 1 (from the create_job example):

{
  "job_id": 2,
  "new_settings": {
    "name": "Nightly model training",
    "new_cluster": {
      "spark_version": "7.3.x-scala2.12",
      "node_type_id": "Standard_D3_v2",
      "num_workers": 10
    },
    "libraries": [
      {
        "jar": "dbfs:/my-jar.jar"
      },
      {
        "maven": {
          "coordinates": "org.jsoup:jsoup:1.7.2"
        }
      }
    ],
    "timeout_seconds": 100000000,
    "max_retries": 1,
    "schedule": {
      "quartz_cron_expression": "0 15 22 ? * *",
      "timezone_id": "America/Los_Angeles",
      "pause_status": "UNPAUSED"
    },
    "spark_jar_task": {
      "main_class_name": "com.databricks.ComputeModels"
    }
  }
}

请求结构 Request structure

覆盖作业设置。Overwrite job settings.

字段名称Field Name 类型Type 描述Description
job_idjob_id INT64 要重置的作业的规范标识符。The canonical identifier of the job to reset. 此字段为必需字段。This field is required.
new_settingsnew_settings JobSettingsJobSettings 作业的新设置。The new settings of the job. 这些新设置会完全替换旧设置。These new settings replace the old settings entirely.

对以下字段的更改不会应用于活动的运行:Changes to the following fields are not applied to active runs:
JobSettings.cluster_specJobSettings.taskJobSettings.cluster_spec or JobSettings.task.

对以下字段的更改会应用于活动的运行以及将来的运行:Changes to the following fields are applied to active runs as well as future runs:
JobSettings.timeout_secondJobSettings.email_notificationsJobSettings.timeout_second, JobSettings.email_notifications, or
JobSettings.retry_policy.JobSettings.retry_policy. 此字段为必需字段。This field is required.

立即运行 Run now

重要

  • 工作区限制为1000并发作业运行。A workspace is limited to 1000 concurrent job runs. 429 Too Many Requests请求无法立即启动的运行时,将返回响应。A 429 Too Many Requests response is returned when you request a run that cannot be started immediately.
  • 工作区在一小时内可以创建的作业数限制为 5000(包括“立即运行”和“运行提交”)。The number of jobs a workspace can create in an hour is limited to 5000 (includes “run now” and “runs submit”). 此限制还会影响 REST API 和笔记本工作流创建的作业。This limit also affects jobs created by the REST API and notebook workflows.
端点Endpoint HTTP 方法HTTP Method
2.0/jobs/run-now POST

立即运行作业,并返回已触发的运行的 run_id。Run a job now and return the run_id of the triggered run.

备注

如果你发现自己经常将“创建”与“立即运行”一起使用,那么你可以考虑使用运行提交 API。If you find yourself using Create together with Run now a lot, you may actually be interested in the Runs submit API. 利用此 API 终结点可以直接提交工作负载,而无需在 Azure Databricks 中创建作业。This API endpoint allows you to submit your workloads directly without having to create a job in Azure Databricks.

笔记本作业的示例请求:An example request for a notebook job:

{
  "job_id": 1,
  "notebook_params": {
    "dry-run": "true",
    "oldest-time-to-consider": "1457570074236"
  }
}

JAR 作业的示例请求:An example request for a JAR job:

{
  "job_id": 2,
  "jar_params": ["param1", "param2"]
}

请求结构 Request structure

立即运行作业,并返回已触发的运行的 run_id。Run a job now and return the run_id of the triggered run.

字段名称Field Name 类型Type 描述Description
job_idjob_id INT64
jar_paramsjar_params 一个由 STRING 构成的数组An array of STRING 具有 JAR 任务的作业的参数列表,例如 "jar_params": ["john doe", "35"]A list of parameters for jobs with JAR tasks, e.g. "jar_params": ["john doe", "35"]. 这些参数将用于调用 Spark JAR 任务中指定的 main 类的 main 函数。The parameters will be used to invoke the main function of the main class specified in the Spark JAR task. 如果未在调用 run-now 时指定,该项将默认为空列表。If not specified upon run-now, it will default to an empty list. jar_params 不能与 notebook_params 一起指定。jar_params cannot be specified in conjunction with notebook_params. 此字段的 JSON 表示形式(即 {"jar_params":["john doe","35"]})不能超过 10,000 字节。The JSON representation of this field (i.e. {"jar_params":["john doe","35"]}) cannot exceed 10,000 bytes.
notebook_paramsnotebook_params ParamPair 的映射A map of ParamPair 具有笔记本任务的作业的从键到值的映射,例如A map from keys to values for jobs with notebook task, e.g.
"notebook_params": {"name": "john doe", "age": "35"}."notebook_params": {"name": "john doe", "age": "35"}. 该映射会传递到笔记本,并且可通过 dbutils.widgets.get 函数访问。The map is passed to the notebook and is accessible through the dbutils.widgets.get function.

如果未在调用 run-now 时指定,已触发的运行会使用该作业的基参数。If not specified upon run-now, the triggered run uses the job’s base parameters.

notebook_params 不能与 jar_params 一起指定。notebook_params cannot be specified in conjunction with jar_params.

此字段的 JSON 表示形式(即The JSON representation of this field (i.e.
{"notebook_params":{"name":"john doe","age":"35"}})不能超过 10,000 字节。{"notebook_params":{"name":"john doe","age":"35"}}) cannot exceed 10,000 bytes.
python_paramspython_params 一个由 STRING 构成的数组An array of STRING 具有 Python 任务的作业的参数列表,例如 "python_params": ["john doe", "35"]A list of parameters for jobs with Python tasks, e.g. "python_params": ["john doe", "35"]. 这些参数将会作为命令行参数传递到 Python 文件。The parameters will be passed to Python file as command-line parameters. 如果在调用 run-now 时指定,则此项会覆盖作业设置中指定的参数。If specified upon run-now, it would overwrite the parameters specified in job setting. 此字段的 JSON 表示形式(即 {"python_params":["john doe","35"]})不能超过 10,000 字节。The JSON representation of this field (i.e. {"python_params":["john doe","35"]}) cannot exceed 10,000 bytes.
spark_submit_paramsspark_submit_params 一个由 STRING 构成的数组An array of STRING 具有“Spark 提交”任务的作业的参数列表,例如A list of parameters for jobs with spark submit task, e.g.
"spark_submit_params": ["--class", "org.apache.spark.examples.SparkPi"]."spark_submit_params": ["--class", "org.apache.spark.examples.SparkPi"]. 这些参数将会作为命令行参数传递到 spark-submit 脚本。The parameters will be passed to spark-submit script as command-line parameters. 如果在调用 run-now 时指定,则此项会覆盖作业设置中指定的参数。If specified upon run-now, it would overwrite the parameters specified in job setting. 此字段的 JSON 表示形式不能超过 10,000 字节。The JSON representation of this field cannot exceed 10,000 bytes.

响应结构 Response structure

字段名称Field Name 类型Type 描述Description
run_idrun_id INT64 新触发的运行的全局唯一 ID。The globally unique ID of the newly triggered run.
number_in_jobnumber_in_job INT64 此运行在该作业的所有运行中的序列号。The sequence number of this run among all runs of the job.

运行提交 Runs submit

重要

  • 工作区限制为1000并发作业运行。A workspace is limited to 1000 concurrent job runs. 429 Too Many Requests请求无法立即启动的运行时,将返回响应。A 429 Too Many Requests response is returned when you request a run that cannot be started immediately.
  • 工作区在一小时内可以创建的作业数限制为 5000(包括“立即运行”和“运行提交”)。The number of jobs a workspace can create in an hour is limited to 5000 (includes “run now” and “runs submit”). 此限制还会影响 REST API 和笔记本工作流创建的作业。This limit also affects jobs created by the REST API and notebook workflows.
端点Endpoint HTTP 方法HTTP Method
2.0/jobs/runs/submit POST

提交一次性运行。Submit a one-time run. 此终结点不要求创建 Databricks 作业。This endpoint doesn’t require a Databricks job to be created. 你可以直接提交工作负载。You can directly submit your workload. 通过此终结点提交的运行不会显示在 UI 中。Runs submitted via this endpoint don’t display in the UI. 提交运行之后,请使用 jobs/runs/get API 来检查运行状态。Once the run is submitted, use the jobs/runs/get API to check the run state.

示例请求:An example request:

{
  "run_name": "my spark task",
  "new_cluster": {
    "spark_version": "7.3.x-scala2.12",
    "node_type_id": "Standard_D3_v2",
    "num_workers": 10
  },
  "libraries": [
    {
      "jar": "dbfs:/my-jar.jar"
    },
    {
      "maven": {
        "coordinates": "org.jsoup:jsoup:1.7.2"
      }
    }
  ],
  "spark_jar_task": {
    "main_class_name": "com.databricks.ComputeModels"
  }
}

以及响应:And response:

{
  "run_id": 123
}

请求结构 Request structure

使用提供的设置提交新运行。Submit a new run with the provided settings.

重要

  • 在新作业群集上运行作业时,该作业会被视为遵从作业计算定价标准的作业计算(自动化)工作负载。When you run a job on a new jobs cluster, the job is treated as a Jobs Compute (automated) workload subject to Jobs Compute pricing.
  • 在现有的通用群集上运行作业时,该作业会被视为遵从通用计算定价标准的通用计算(交互式)工作负载。When you run a job on an existing all-purpose cluster, it is treated as an All-Purpose Compute (interactive) workload subject to All-Purpose Compute pricing.
字段名称Field Name 类型Type 描述Description
existing_cluster_id 或 new_clusterexisting_cluster_id OR new_cluster STRINGNewClusterSTRING OR NewCluster 如果是 existing_cluster_id,则该项是将用于此作业的所有运行的现有群集的 ID。If existing_cluster_id, the ID of an existing cluster that will be used for all runs of this job. 在现有群集上运行作业时,如果该群集停止响应,则可能需要手动重启该群集。When running jobs on an existing cluster, you may need to manually restart the cluster if it stops responding. 建议在新群集上运行作业,以获得更高的可靠性。We suggest running jobs on new clusters for greater reliability.

如果是 new_cluster,则该项是将为每个运行创建的群集的说明。If new_cluster, a description of a cluster that will be created for each run.
notebook_task 或 spark_jar_task 或 spark_python_task 或 spark_submit_tasknotebook_task OR spark_jar_task OR spark_python_task OR spark_submit_task NotebookTaskSparkJarTaskSparkPythonTaskSparkSubmitTaskNotebookTask OR SparkJarTask OR SparkPythonTask OR SparkSubmitTask 如果是 notebook_task,则表明此作业应该运行笔记本。If notebook_task, indicates that this job should run a notebook. 此字段不能与 spark_jar_task 一起指定。This field may not be specified in conjunction with spark_jar_task.

如果是 spark_jar_task,则表明此作业应该运行 JAR。If spark_jar_task, indicates that this job should run a JAR.

如果是 spark_python_task,则表明此作业应该运行 Python 文件。If spark_python_task, indicates that this job should run a Python file.

如果是 spark_submit_task,则表明此作业应该运行 spark-submit 脚本。If spark_submit_task, indicates that this job should run spark submit script.
run_namerun_name STRING 可选的作业名称。An optional name for the run. 默认值是 UntitledThe default value is Untitled.
libraries 一个由构成的数组An array of Library 可选的库列表,这些库会安装在将要执行该作业的群集上。An optional list of libraries to be installed on the cluster that will execute the job. 默认值为空列表。The default value is an empty list.
timeout_secondstimeout_seconds INT32 可选的超时设置,应用于此作业的每个运行。An optional timeout applied to each run of this job. 默认行为是没有任何超时。The default behavior is to have no timeout.
idempotency_tokenidempotency_token STRING 可选令牌,可用于保证作业运行请求的幂等性。An optional token that can be used to guarantee the idempotency of job run requests. 如果已经存在具有提供的令牌的活动运行,该请求将不会创建新运行,而是会返回现有运行的 ID。If an active run with the provided token already exists, the request will not create a new run, but will return the ID of the existing run instead.

如果指定幂等性令牌,则可在失败时重试,直到该请求成功。If you specify the idempotency token, upon failure you can retry until the request succeeds. Azure Databricks 会确保只有一个运行将通过该幂等性令牌启动。Azure Databricks guarantees that exactly one run will be launched with that idempotency token.

此令牌最多只能包含 64 个字符。This token should have at most 64 characters.

响应结构 Response structure

字段名称Field Name 类型Type 描述Description
run_idrun_id INT64 新提交的运行的规范标识符。The canonical identifier for the newly submitted run.

运行列表 Runs list

端点Endpoint HTTP 方法HTTP Method
2.0/jobs/runs/list GET

按启动时间(从近到远)列出运行。List runs from most recently started to least.

备注

运行在 60 天之后会自动删除。Runs are automatically removed after 60 days. 如果在 60 天之后还需要引用这些运行,则应在它们过期之前保存旧运行结果。If you to want to reference them beyond 60 days, you should save old run results before they expire. 若要使用 UI 导出,请参阅导出作业运行结果To export using the UI, see Export job run results. 若要使用作业 API 导出,请参阅运行导出To export using the Job API, see Runs export.

示例请求:An example request:

/jobs/runs/list?job_id=1&active_only=false&offset=1&limit=1

以及响应:And response:

{
  "runs": [
    {
      "job_id": 1,
      "run_id": 452,
      "number_in_job": 5,
      "state": {
        "life_cycle_state": "RUNNING",
        "state_message": "Performing action"
      },
      "task": {
        "notebook_task": {
          "notebook_path": "/Users/donald@duck.com/my-notebook"
        }
      },
      "cluster_spec": {
        "existing_cluster_id": "1201-my-cluster"
      },
      "cluster_instance": {
        "cluster_id": "1201-my-cluster",
        "spark_context_id": "1102398-spark-context-id"
      },
      "overriding_parameters": {
        "jar_params": ["param1", "param2"]
      },
      "start_time": 1457570074236,
      "setup_duration": 259754,
      "execution_duration": 3589020,
      "cleanup_duration": 31038,
      "trigger": "PERIODIC"
    }
  ],
  "has_more": true
}

请求结构 Request structure

按启动时间(从近到远)列出运行。List runs from most recently started to least.

字段名称Field Name 类型Type 描述Description
active_only 或 completed_onlyactive_only OR completed_only BOOLBOOLBOOL OR BOOL 如果 active_only 为 true,则结果中只包括活动运行;否则会将活动运行和已完成的运行都列出。If active_only is true, only active runs are included in the results; otherwise, lists both active and completed runs. 活动运行是 RunLifecycleStatePENDINGRUNNINGTERMINATING 的运行。An active run is a run in the PENDING, RUNNING, or TERMINATING RunLifecycleState. 在 completed_only 为 true 时,此字段不能为 trueThis field cannot be true when completed_only is true.

如果 completed_only 为 true,则结果中仅包括已完成的运行;否则会将活动运行和已完成的运行都列出。If completed_only is true, only completed runs are included in the results; otherwise, lists both active and completed runs. 在 active_only 为 true 时,此字段不能为 trueThis field cannot be true when active_only is true.
job_idjob_id INT64 要列出其运行的作业。The job for which to list runs. 如果省略,作业服务将会列出所有作业中的运行。If omitted, the Jobs service will list runs from all jobs.
offsetoffset INT32 要返回的第一个运行的偏移量(相对于最近的运行)。The offset of the first run to return, relative to the most recent run.
limitlimit INT32 要返回的运行数。The number of runs to return. 此值应大于 0 且小于 1000。This value should be greater than 0 and less than 1000. 默认值为 20。The default value is 20. 如果请求将限制指定为 0,该服务将会改用最大限制。If a request specifies a limit of 0, the service will instead use the maximum limit.

响应结构 Response structure

字段名称Field Name 类型Type 描述Description
运行runs 一个由运行构成的数组An array of Run 运行的列表,按启动时间(由近到远)列出。A list of runs, from most recently started to least.
has_morehas_more BOOL 如果为 true,则可以列出与提供的筛选器匹配的其他运行。If true, additional runs matching the provided filter are available for listing.

运行获取 Runs get

端点Endpoint HTTP 方法HTTP Method
2.0/jobs/runs/get GET

检索运行的元数据。Retrieve the metadata of a run.

备注

运行在 60 天之后会自动删除。Runs are automatically removed after 60 days. 如果在 60 天之后还需要引用这些运行,则应在它们过期之前保存旧运行结果。If you to want to reference them beyond 60 days, you should save old run results before they expire. 若要使用 UI 导出,请参阅导出作业运行结果To export using the UI, see Export job run results. 若要使用作业 API 导出,请参阅运行导出To export using the Job API, see Runs export.

示例请求:An example request:

/jobs/runs/get?run_id=452

示例响应:An example response:

{
  "job_id": 1,
  "run_id": 452,
  "number_in_job": 5,
  "state": {
    "life_cycle_state": "RUNNING",
    "state_message": "Performing action"
  },
  "task": {
    "notebook_task": {
      "notebook_path": "/Users/donald@duck.com/my-notebook"
    }
  },
  "cluster_spec": {
    "existing_cluster_id": "1201-my-cluster"
  },
  "cluster_instance": {
    "cluster_id": "1201-my-cluster",
    "spark_context_id": "1102398-spark-context-id"
  },
  "overriding_parameters": {
    "jar_params": ["param1", "param2"]
  },
  "start_time": 1457570074236,
  "setup_duration": 259754,
  "execution_duration": 3589020,
  "cleanup_duration": 31038,
  "trigger": "PERIODIC"
}

请求结构 Request structure

检索无任何输出的运行的元数据。Retrieve the metadata of a run without any output.

字段名称Field Name 类型Type 描述Description
run_idrun_id INT64 要检索其元数据的运行的规范标识符。The canonical identifier of the run for which to retrieve the metadata. 此字段为必需字段。This field is required.

响应结构 Response structure

字段名称Field Name 类型Type 描述Description
job_idjob_id INT64 包含此运行的作业的规范标识符。The canonical identifier of the job that contains this run.
run_idrun_id INT64 该运行的规范标识符。The canonical identifier of the run. 此 ID 在所有作业的所有运行中都是唯一的。This ID is unique across all runs of all jobs.
number_in_jobnumber_in_job INT64 此运行在该作业的所有运行中的序列号。The sequence number of this run among all runs of the job. 此值从 1 开始。This value starts at 1.
original_attempt_run_idoriginal_attempt_run_id INT64 如果此运行是对之前某次运行尝试的重试,则此字段会包含原始尝试的 run_id;否则此字段与本次运行的 run_id 相同。If this run is a retry of a prior run attempt, this field contains the run_id of the original attempt; otherwise, it is the same as the run_id.
statestate RunStateRunState 运行的结果和生命周期状态。The result and lifecycle states of the run.
scheduleschedule CronScheduleCronSchedule 触发此运行的 cron 计划(如果此运行已由定期计划程序触发)。The cron schedule that triggered this run if it was triggered by the periodic scheduler.
tasktask JobTaskJobTask 由该运行执行的任务(如果有)。The task performed by the run, if any.
cluster_speccluster_spec ClusterSpecClusterSpec 创建此运行时作业的群集规范的快照。A snapshot of the job’s cluster specification when this run was created.
cluster_instancecluster_instance ClusterInstanceClusterInstance 用于此运行的群集。The cluster used for this run. 如果将该运行指定为使用新群集,则在作业服务已为该运行请求群集后,将会设置此字段。If the run is specified to use a new cluster, this field will be set once the Jobs service has requested a cluster for the run.
overriding_parametersoverriding_parameters RunParametersRunParameters 用于此运行的参数。The parameters used for this run.
start_timestart_time INT64 启动此运行的时间,以epoch 毫秒表示(自 UTC 1970 年 1 月 1 日起的毫秒数)。The time at which this run was started in epoch milliseconds (milliseconds since 1/1/1970 UTC). 此时间可能不是作业任务开始执行的时间,例如,如果该作业已计划在新群集上运行,则此时间是发出群集创建调用的时间。This may not be the time when the job task starts executing, for example, if the job is scheduled to run on a new cluster, this is the time the cluster creation call is issued.
setup_durationsetup_duration INT64 设置该群集所花费的时间(以毫秒为单位)。The time it took to set up the cluster in milliseconds. 对于在新群集上运行的运行,此时间是群集创建时间,对于在现有群集上运行的运行,此时间应该很短。For runs that run on new clusters this is the cluster creation time, for runs that run on existing clusters this time should be very short.
execution_durationexecution_duration INT64 执行 JAR 或笔记本中的命令(直到这些命令已完成、失败、超时、被取消或遇到意外错误)所花费的时间(以毫秒为单位)。The time in milliseconds it took to execute the commands in the JAR or notebook until they completed, failed, timed out, were cancelled, or encountered an unexpected error.
cleanup_durationcleanup_duration INT64 终止该群集并清理任何中间结果等等所花费的时间(以毫秒为单位)。该运行的总持续时间为 setup_duration、execution_duration 以及 cleanup_duration 之和。The time in milliseconds it took to terminate the cluster and clean up any intermediary results, etc. The total duration of the run is the sum of the setup_duration, the execution_duration, and the cleanup_duration.
触发器trigger TriggerTypeTriggerType 触发此运行的触发器的类型,例如,定期计划或一次性运行。The type of trigger that fired this run, e.g., a periodic schedule or a one time run.
creator_user_namecreator_user_name STRING 创建者用户名。The creator user name. 如果已将该用户删除,响应中将不会包含此字段。This field won’t be included in the response if the user has already been deleted.
run_page_urlrun_page_url STRING 运行的详细信息页的 URL。The URL to the detail page of the run.

运行导出 Runs export

端点Endpoint HTTP 方法HTTP Method
2.0/jobs/runs/export GET

导出并检索作业运行任务。Export and retrieve the job run task.

备注

只有笔记本运行可以采用 HTML 格式导出。Only notebook runs can be exported in HTML format. 导出其他类型的其他运行将会失败。Exporting other runs of other types will fail.

示例请求:An example request:

/jobs/runs/export?run_id=452

示例响应:An example response:

{
  "views": [ {
    "content": "<!DOCTYPE html><html><head>Head</head><body>Body</body></html>",
    "name": "my-notebook",
    "type": "NOTEBOOK"
  } ]
}

若要从 JSON 响应中提取 HTML 笔记本,请下载并运行此 Python 脚本To extract the HTML notebook from the JSON response, download and run this Python script.

备注

__DATABRICKS_NOTEBOOK_MODEL 对象中的笔记本正文已编码。The notebook body in the __DATABRICKS_NOTEBOOK_MODEL object is encoded.

请求结构 Request structure

检索作业运行任务的导出。Retrieve the export of a job run task.

字段名称Field Name 类型Type 描述Description
run_idrun_id INT64 该运行的规范标识符。The canonical identifier for the run. 此字段为必需字段。This field is required.
views_to_exportviews_to_export ViewsToExportViewsToExport 要导出哪些视图(CODE、DASHBOARDS 或 ALL)。Which views to export (CODE, DASHBOARDS, or ALL). 默认为“CODE”。Defaults to CODE.

响应结构 Response structure

字段名称Field Name 类型Type 描述Description
视图views 一个由 ViewItem 构成的数组An array of ViewItem HTML 格式的导出内容(每个视图项一个)。The exported content in HTML format (one for every view item).

运行取消 Runs cancel

端点Endpoint HTTP 方法HTTP Method
2.0/jobs/runs/cancel POST

取消运行。Cancel a run. 该运行会被异步取消,因此,在此请求完成时,该运行可能仍在运行。The run is canceled asynchronously, so when this request completes, the run may still be running. 该运行稍后将会被终止。The run will be terminated shortly. 如果该运行已处于最终的 life_cycle_state,则此方法为无操作。If the run is already in a terminal life_cycle_state, this method is a no-op.

此终结点会验证 run_id 参数是否有效,如果参数无效,则会返回 HTTP 状态代码 400。This endpoint validates that the run_id parameter is valid and for invalid parameters returns HTTP status code 400.

示例请求:An example request:

{
  "run_id": 453
}

请求结构 Request structure

取消运行。Cancel a run. 该运行会被异步取消,因此,在此请求完成时,该运行可能仍处于活动状态。The run is canceled asynchronously, so when this request completes the run may be still be active. 该运行将会尽快被终止。The run will be terminated as soon as possible.

字段名称Field Name 类型Type 描述Description
run_idrun_id INT64 此字段为必需字段。This field is required.

运行获取输出 Runs get output

端点Endpoint HTTP 方法HTTP Method
2.0/jobs/runs/get-output GET

检索运行的输出。Retrieve the output of a run. 在笔记本任务通过 dbutils.notebook.exit() 调用返回一个值时,可以使用此终结点来检索该值。When a notebook task returns a value through the dbutils.notebook.exit() call, you can use this endpoint to retrieve that value. Azure Databricks 将此 API 限制为返回输出中的前 5 MB。Azure Databricks restricts this API to return the first 5 MB of the output. 若要返回更大的结果,可将作业结果存储在云存储服务中。For returning a larger result, you can store job results in a cloud storage service.

此终结点会验证 run_id 参数是否有效,如果参数无效,则会返回 HTTP 状态代码 400。This endpoint validates that the run_id parameter is valid and for invalid parameters returns HTTP status code 400.

运行在 60 天之后会自动删除。Runs are automatically removed after 60 days. 如果在 60 天之后还需要引用这些运行,则应在它们过期之前保存旧运行结果。If you to want to reference them beyond 60 days, you should save old run results before they expire. 若要使用 UI 导出,请参阅导出作业运行结果To export using the UI, see Export job run results. 若要使用作业 API 导出,请参阅运行导出To export using the Job API, see Runs export.

示例请求:An example request:

/jobs/runs/get-output?run_id=453

以及响应:And response:

{
  "metadata": {
    "job_id": 1,
    "run_id": 452,
    "number_in_job": 5,
    "state": {
      "life_cycle_state": "TERMINATED",
      "result_state": "SUCCESS",
      "state_message": ""
    },
    "task": {
      "notebook_task": {
        "notebook_path": "/Users/donald@duck.com/my-notebook"
      }
    },
    "cluster_spec": {
      "existing_cluster_id": "1201-my-cluster"
    },
    "cluster_instance": {
      "cluster_id": "1201-my-cluster",
      "spark_context_id": "1102398-spark-context-id"
    },
    "overriding_parameters": {
      "jar_params": ["param1", "param2"]
    },
    "start_time": 1457570074236,
    "setup_duration": 259754,
    "execution_duration": 3589020,
    "cleanup_duration": 31038,
    "trigger": "PERIODIC"
  },
  "notebook_output": {
    "result": "the maybe truncated string passed to dbutils.notebook.exit()"
  }
}

请求结构 Request structure

检索运行的输出和元数据。Retrieves both the output and the metadata of a run.

字段名称Field Name 类型Type 描述Description
run_idrun_id INT64 该运行的规范标识符。The canonical identifier for the run. 此字段为必需字段。This field is required.

响应结构 Response structure

字段名称Field Name 类型Type 描述Description
notebook_output 或 errornotebook_output OR error NotebookOutputSTRINGNotebookOutput OR STRING 如果是 notebook_output,则该项是笔记本任务的输出(如果可用)。If notebook_output, the output of a notebook task, if available. 在未调用的情况下终止(成功或失败)的笔记本任务A notebook task that terminates (either successfully or with a failure) without calling
dbutils.notebook.exit() 被视为会有一个空输出。dbutils.notebook.exit() is considered to have an empty output. 此字段将会被设置,但其结果值将为空。This field will be set but its result value will be empty.

如果是 error,则此项是一个错误消息,指示输出不可用的原因。If error, an error message indicating why output is not available. 该消息是非结构化的,且其确切格式随时可能发生变更。The message is unstructured, and its exact format is subject to change.
metadatametadata 运行Run 该运行的全部详细信息(其输出除外)。All details of the run except for its output.

运行删除 Runs delete

端点Endpoint HTTP 方法HTTP Method
2.0/jobs/runs/delete POST

删除非活动运行。Delete a non-active run. 如果该运行处于活动状态,则返回错误。Returns an error if the run is active.

示例请求:An example request:

{
  "run_id": 42
}

请求结构 Request structure

检索无任何输出的运行的元数据。Retrieve the metadata of a run without any output.

字段名称Field Name 类型Type 描述Description
run_idrun_id INT64 要检索其元数据的运行的规范标识符。The canonical identifier of the run for which to retrieve the metadata.

数据结构 Data structures

本节内容:In this section:

ClusterInstance ClusterInstance

某个运行所使用的群集和 Spark 上下文的标识符。Identifiers for the cluster and Spark context used by a run. 这两个值总是会共同标识执行上下文。These two values together identify an execution context across all time.

字段名称Field Name 类型Type 描述Description
cluster_idcluster_id STRING 某个运行所使用的群集的规范标识符。The canonical identifier for the cluster used by a run. 此字段始终可用于现有群集上的运行。This field is always available for runs on existing clusters. 对于新群集上的运行,此字段会在创建群集后变为可用。For runs on new clusters, it becomes available once the cluster is created. 可使用此值来查看日志,具体方法是浏览到 /#setting/sparkui/$cluster_id/driver-logsThis value can be used to view logs by browsing to /#setting/sparkui/$cluster_id/driver-logs. 在运行完成之后,日志将继续可用。The logs will continue to be available after the run completes.

如果此标识符尚不可用,响应将不包含此字段。If this identifier is not yet available, the response won’t include this field.
spark_context_idspark_context_id STRING 某个运行所使用的 Spark 上下文的规范标识符。The canonical identifier for the Spark context used by a run. 该运行开始执行后,此字段将会被填充。This field will be filled in once the run begins execution. 可使用此值来查看 Spark UI,具体方法是浏览到 /#setting/sparkui/$cluster_id/$spark_context_idThis value can be used to view the Spark UI by browsing to /#setting/sparkui/$cluster_id/$spark_context_id. 在运行完成之后,Spark UI 将继续可用。The Spark UI will continue to be available after the run has completed.

如果此标识符尚不可用,响应将不包含此字段。If this identifier is not yet available, the response won’t include this field.

ClusterSpec ClusterSpec

重要

  • 在新作业群集上运行作业时,该作业会被视为遵从作业计算定价标准的作业计算(自动化)工作负载。When you run a job on a new jobs cluster, the job is treated as a Jobs Compute (automated) workload subject to Jobs Compute pricing.
  • 在现有的通用群集上运行作业时,该作业会被视为遵从通用计算定价标准的通用计算(交互式)工作负载。When you run a job on an existing all-purpose cluster, it is treated as an All-Purpose Compute (interactive) workload subject to All-Purpose Compute pricing.
字段名称Field Name 类型Type 描述Description
existing_cluster_id 或 new_clusterexisting_cluster_id OR new_cluster STRINGNewClusterSTRING OR NewCluster 如果是 existing_cluster_id,则该项是将用于此作业的所有运行的现有群集的 ID。If existing_cluster_id, the ID of an existing cluster that will be used for all runs of this job. 在现有群集上运行作业时,如果该群集停止响应,则可能需要手动重启该群集。When running jobs on an existing cluster, you may need to manually restart the cluster if it stops responding. 建议在新群集上运行作业,以获得更高的可靠性。We suggest running jobs on new clusters for greater reliability.

如果是 new_cluster,则该项是将为每个运行创建的群集的说明。If new_cluster, a description of a cluster that will be created for each run.
libraries 一个由构成的数组An array of Library 可选的库列表,这些库会安装在将要执行该作业的群集上。An optional list of libraries to be installed on the cluster that will execute the job. 默认值为空列表。The default value is an empty list.

CronSchedule CronSchedule

字段名称Field Name 类型Type 描述Description
quartz_cron_expressionquartz_cron_expression STRING 一个使用 Quartz 语法的 Cron 表达式,用于描述作业的计划。A Cron expression using Quartz syntax that describes the schedule for a job. 有关详细信息,请参阅 Cron 触发器See Cron Trigger for details. 此字段为必需字段。This field is required.
timezone_idtimezone_id STRING Java 时区 ID。A Java timezone ID. 将根据此时区解析作业的计划。The schedule for a job will be resolved with respect to this timezone. 有关详细信息,请参阅 Java 时区See Java TimeZone for details. 此字段为必需字段。This field is required.
pause_statuspause_status STRING 指示此计划是否已暂停。Indicate whether this schedule is paused or not. 值为“PAUSED”或“UNPAUSED”。Either “PAUSED” or “UNPAUSED”.

作业 Job

字段名称Field Name 类型Type 描述Description
job_idjob_id INT64 此作业的规范标识符。The canonical identifier for this job.
creator_user_namecreator_user_name STRING 创建者用户名。The creator user name. 如果已将该用户删除,响应中将不会包含此字段。This field won’t be included in the response if the user has already been deleted.
设置settings JobSettingsJobSettings 此作业及其所有运行的设置。Settings for this job and all of its runs. 可以使用 resetJob 方法来更新这些设置。These settings can be updated using the resetJob method.
created_timecreated_time INT64 此作业的创建时间,以 epoch 毫秒表示(自 UTC 1970 年 1 月 1 日起的毫秒数)。The time at which this job was created in epoch milliseconds (milliseconds since 1/1/1970 UTC).

JobEmailNotifications JobEmailNotifications

重要

on_start、on_success 和 on_failure 等字段只接受拉丁字符(ASCII 字符集)。The on_start, on_success, and on_failure fields accept only Latin characters (ASCII character set). 使用非 ASCII 字符将会返回错误。Using non-ASCII characters will return an error. 例如,中文、日文汉字和表情符号都属于无效的非 ASCII 字符。Examples of invalid, non-ASCII characters are Chinese, Japanese kanjis, and emojis.

字段名称Field Name 类型Type 描述Description
on_starton_start 一个由 STRING 构成的数组An array of STRING 一个电子邮件地址列表,其中的电子邮件地址在运行开始时会收到通知。A list of email addresses to be notified when a run begins. 如果在创建或重置作业时未指定该列表,该列表将会为空,即,将不会向任何地址发送通知。If not specified upon job creation or reset, the list will be empty, i.e., no address will be notified.
on_successon_success 一个由 STRING 构成的数组An array of STRING 一个电子邮件地址列表,其中的电子邮件地址在运行成功完成时会收到通知。A list of email addresses to be notified when a run successfully completes. 如果在运行结束时 life_cycle_stateTERMINATED 且 result_state 为 SUCCESSFUL,则该运行会被视为已成功完成。A run is considered to have completed successfully if it ends with a TERMINATED life_cycle_state and a SUCCESSFUL result_state. 如果在创建或重置作业时未指定该列表,该列表将会为空,即,将不会向任何地址发送通知。If not specified upon job creation or reset, the list will be empty, i.e., no address will be notified.
on_failureon_failure 一个由 STRING 构成的数组An array of STRING 一个电子邮件地址列表,其中的电子邮件地址在运行未成功完成时会收到通知。A list of email addresses to be notified when a run unsuccessfully completes. 如果运行在结束时有 INTERNAL_ERRORA run is considered to have completed unsuccessfully if it ends with an INTERNAL_ERROR
life_cycle_state,或者有 SKIPPEDFAILEDTIMED_OUT result_state,则该运行会被视为未成功完成。life_cycle_state or a SKIPPED, FAILED, or TIMED_OUT result_state. 如果在创建或重置作业时未指定该列表,该列表将会为空,即,将不会向任何地址发送通知。If not specified upon job creation or reset, the list will be empty, i.e., no address will be notified.
no_alert_for_skipped_runsno_alert_for_skipped_runs BOOL 如果为 true,则在该运行被跳过的情况下不会向 on_failure 中指定的收件人发送电子邮件。If true, do not send email to recipients specified in on_failure if the run is skipped.

JobSettings JobSettings

重要

  • 在新作业群集上运行作业时,该作业会被视为遵从作业计算定价标准的作业计算(自动化)工作负载。When you run a job on a new jobs cluster, the job is treated as a Jobs Compute (automated) workload subject to Jobs Compute pricing.
  • 在现有的通用群集上运行作业时,该作业会被视为遵从通用计算定价标准的通用计算(交互式)工作负载。When you run a job on an existing all-purpose cluster, it is treated as an All-Purpose Compute (interactive) workload subject to All-Purpose Compute pricing.

作业的设置。Settings for a job. 可以使用 resetJob 方法来更新这些设置。These settings can be updated using the resetJob method.

字段名称Field Name 类型Type 描述Description
existing_cluster_id 或 new_clusterexisting_cluster_id OR new_cluster STRINGNewClusterSTRING OR NewCluster 如果是 existing_cluster_id,则该项是将用于此作业的所有运行的现有群集的 ID。If existing_cluster_id, the ID of an existing cluster that will be used for all runs of this job. 在现有群集上运行作业时,如果该群集停止响应,则可能需要手动重启该群集。When running jobs on an existing cluster, you may need to manually restart the cluster if it stops responding. 建议在新群集上运行作业,以获得更高的可靠性。We suggest running jobs on new clusters for greater reliability.

如果是 new_cluster,则该项是将为每个运行创建的群集的说明。If new_cluster, a description of a cluster that will be created for each run.
notebook_task 或 spark_jar_task 或 spark_python_task 或 spark_submit_tasknotebook_task OR spark_jar_task OR spark_python_task OR spark_submit_task NotebookTaskSparkJarTaskSparkPythonTaskSparkSubmitTaskNotebookTask OR SparkJarTask OR SparkPythonTask OR SparkSubmitTask 如果是 notebook_task,则表明此作业应该运行笔记本。If notebook_task, indicates that this job should run a notebook. 此字段不能与 spark_jar_task 一起指定。This field may not be specified in conjunction with spark_jar_task.

如果是 spark_jar_task,则表明此作业应该运行 JAR。If spark_jar_task, indicates that this job should run a JAR.

如果是 spark_python_task,则表明此作业应该运行 Python 文件。If spark_python_task, indicates that this job should run a Python file.

如果是 spark_submit_task,则表明此作业应该运行 spark-submit 脚本。If spark_submit_task, indicates that this job should run Spark submit script.
namename STRING 可选的作业名称。An optional name for the job. 默认值是 UntitledThe default value is Untitled.
libraries 一个由构成的数组An array of Library 可选的库列表,这些库会安装在将要执行该作业的群集上。An optional list of libraries to be installed on the cluster that will execute the job. 默认值为空列表。The default value is an empty list.
email_notificationsemail_notifications JobEmailNotificationsJobEmailNotifications 可选的一组电子邮件地址,在此作业的运行开始或完成时,以及在删除此作业时,这些电子邮件地址将会收到通知。An optional set of email addresses that will be notified when runs of this job begin or complete as well as when this job is deleted. 默认行为是不发送任何电子邮件。The default behavior is to not send any emails.
timeout_secondstimeout_seconds INT32 可选的超时设置,应用于此作业的每个运行。An optional timeout applied to each run of this job. 默认行为是没有任何超时。The default behavior is to have no timeout.
max_retriesmax_retries INT32 对未成功的运行进行重试的最大次数,可选。An optional maximum number of times to retry an unsuccessful run. 如果运行在完成时其以下状态为 FAILED,则会被视为未成功:result_state 或A run is considered to be unsuccessful if it completes with the FAILED result_state or
INTERNAL_ERROR
life_cycle_state.life_cycle_state. 值 -1 表示要无限次重试,而值 0 则表示从不重试。The value -1 means to retry indefinitely and the value 0 means to never retry. 默认行为是从不重试。The default behavior is to never retry.
min_retry_interval_millismin_retry_interval_millis INT32 可选的最小时间间隔(介于两次尝试之间),以毫秒为单位。An optional minimal interval in milliseconds between attempts. 默认行为是立即重试未成功的运行。The default behavior is that unsuccessful runs are immediately retried.
retry_on_timeoutretry_on_timeout BOOL 一个可选的策略,用于指定在作业超时是否重试该作业。默认行为是在超时发生时不重试。An optional policy to specify whether to retry a job when it times out. The default behavior is to not retry on timeout.
scheduleschedule CronScheduleCronSchedule 此作业的可选定期计划。An optional periodic schedule for this job. 默认行为是该作业将只在通过以下方式被触发时运行:在作业 UI 中单击“立即运行”,或将 API 请求发送到The default behavior is that the job will only run when triggered by clicking “Run Now” in the Jobs UI or sending an API request to
runNow.runNow.
max_concurrent_runsmax_concurrent_runs INT32 允许的作业最大并发运行数,可选。An optional maximum allowed number of concurrent runs of the job.

如果希望能够以并发方式执行同一作业的多个运行,请设置此值。Set this value if you want to be able to execute multiple runs of the same job concurrently. 设置此值适用于这样的情形:例如,如果你按计划频繁触发作业并希望允许连续的运行彼此重叠,或者,如果你希望触发多个在输入参数方面有区别的运行。This is useful for example if you trigger your job on a frequent schedule and want to allow consecutive runs to overlap with each other, or if you want to trigger multiple runs which differ by their input parameters.

此设置只影响新的运行。This setting affects only new runs. 例如,假定作业的并发数为 4,并且存在 4 个并发的活动运行。For example, suppose the job’s concurrency is 4 and there are 4 concurrent active runs. 那么,将该并发数设置为 3 则不会终止任何活动的运行。Then setting the concurrency to 3 won’t kill any of the active runs. 但是,从此之后,除非活动的运行少于 3 个,否则将会跳过新的运行。However, from then on, new runs will be skipped unless there are fewer than 3 active runs.

此值不能超过 1000。This value cannot exceed 1000. 将此值设置为 0 会导致跳过所有新的运行。Setting this value to 0 causes all new runs to be skipped. 默认行为是只允许 1 个并发运行。The default behavior is to allow only 1 concurrent run.

JobTask JobTask

字段名称Field Name 类型Type 描述Description
notebook_task 或 spark_jar_task 或 spark_python_task 或 spark_submit_tasknotebook_task OR spark_jar_task OR spark_python_task OR spark_submit_task NotebookTaskSparkJarTaskSparkPythonTaskSparkSubmitTaskNotebookTask OR SparkJarTask OR SparkPythonTask OR SparkSubmitTask 如果是 notebook_task,则表明此作业应该运行笔记本。If notebook_task, indicates that this job should run a notebook. 此字段不能与 spark_jar_task 一起指定。This field may not be specified in conjunction with spark_jar_task.

如果是 spark_jar_task,则表明此作业应该运行 JAR。If spark_jar_task, indicates that this job should run a JAR.

如果是 spark_python_task,则表明此作业应该运行 Python 文件。If spark_python_task, indicates that this job should run a Python file.

如果是 spark_submit_task,则表明此作业应该运行 spark-submit 脚本。If spark_submit_task, indicates that this job should run spark submit script.

NewCluster NewCluster

字段名称Field Name 类型Type 描述Description
num_workers 或 autoscalenum_workers OR autoscale INT32AutoScaleINT32 OR AutoScale 如果是 num_workers,则此项为此群集应该具有的工作器节点数。If num_workers, number of worker nodes that this cluster should have. 一个群集有一个 Spark 驱动程序和 num_workers 个执行程序用于总共 (num_workers + 1) 个 Spark 节点。A cluster has one Spark driver and num_workers executors for a total of num_workers + 1 Spark nodes.

注意:在读取群集的属性时,此字段反映的是所需的工作器数,而不是当前实际的工作器数。Note: When reading the properties of a cluster, this field reflects the desired number of workers rather than the actual current number of workers. 例如,如果将群集的大小从 5 个工作器调整到 10 个工作器,此字段将会立即更新,以反映 10 个工作器的目标大小,而 spark_info 中列出的工作器会随着新节点的预配,逐渐从 5 个增加到 10 个。For instance, if a cluster is resized from 5 to 10 workers, this field will immediately be updated to reflect the target size of 10 workers, whereas the workers listed in spark_info gradually increase from 5 to 10 as the new nodes are provisioned.

如果是 autoscale,则会需要参数,以便根据负载自动纵向扩展或缩减群集。If autoscale, parameters needed in order to automatically scale clusters up and down based on load.
spark_versionspark_version STRING 群集的 Spark 版本。The Spark version of the cluster. 可以通过使用运行时版本 API 调用来检索可用 Spark 版本的列表。A list of available Spark versions can be retrieved by using the Runtime versions API call. 此字段为必需字段。This field is required.
spark_confspark_conf SparkConfPairSparkConfPair 一个对象,其中包含一组可选的由用户指定的 Spark 配置键值对。An object containing a set of optional, user-specified Spark configuration key-value pairs. 你也可以分别通过以下属性,将额外 JVM 选项的字符串传入到驱动程序和执行程序:You can also pass in a string of extra JVM options to the driver and the executors via
spark.driver.extraJavaOptionsspark.executor.extraJavaOptionsspark.driver.extraJavaOptions and spark.executor.extraJavaOptions respectively.

示例 Spark 配置:Example Spark confs:
{"spark.speculation": true, "spark.streaming.ui.retainedBatches": 5}{"spark.speculation": true, "spark.streaming.ui.retainedBatches": 5} or
{"spark.driver.extraJavaOptions": "-verbose:gc -XX:+PrintGCDetails"}
node_type_idnode_type_id STRING 此字段通过单个值对提供给此群集中的每个 Spark 节点的资源进行编码。This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. 例如,可以针对内存密集型或计算密集型的工作负载来预配和优化 Spark 节点。通过使用列出节点类型 API 调用可以检索可用节点类型的列表。For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads A list of available node types can be retrieved by using the List node types API call. 此字段为必需字段。This field is required.
driver_node_type_iddriver_node_type_id STRING Spark 驱动程序的节点类型。The node type of the Spark driver. 此字段为可选;如果未设置,驱动程序节点类型会被设置为与上面定义的 node_type_id 相同的值。This field is optional; if unset, the driver node type is set as the same value as node_type_id defined above.
custom_tagscustom_tags ClusterTagClusterTag 一个对象,其中包含群集资源的一组标记。An object containing a set of tags for cluster resources. Azure Databricks 会使用这些标记以及 default_tags 来标记所有的群集资源。Azure Databricks tags all cluster resources with these tags in addition to default_tags.
注意: Databricks 最多允许45个自定义标记。Note: Databricks allows at most 45 custom tags.
cluster_log_confcluster_log_conf ClusterLogConfClusterLogConf 用于将 Spark 日志传递到长期存储目标的配置。The configuration for delivering Spark logs to a long-term storage destination. 对于一个群集,只能指定一个目标。Only one destination can be specified for one cluster. 如果提供该配置,则会每隔 5 mins 向目标发送一次日志。If the conf is given, the logs will be delivered to the destination every 5 mins. 驱动程序日志的目标是 <destination>/<cluster-id>/driver,而执行程序日志的目标是 <destination>/<cluster-id>/executorThe destination of driver logs is <destination>/<cluster-id>/driver, while the destination of executor logs is <destination>/<cluster-id>/executor.
init_scriptsinit_scripts 一个由 InitScriptInfo 构成的数组An array of InitScriptInfo 用于存储初始化脚本的配置。The configuration for storing init scripts. 可以指定任意数量的脚本。Any number of scripts can be specified. 这些脚本会按照所提供的顺序依次执行。The scripts are executed sequentially in the order provided. 如果指定了 cluster_log_conf,初始化脚本日志将会发送到If cluster_log_conf is specified, init script logs are sent to
<destination>/<cluster-id>/init_scripts.<destination>/<cluster-id>/init_scripts.
spark_env_varsspark_env_vars SparkEnvPairSparkEnvPair 一个对象,其中包含一组可选的由用户指定的环境变量键值对。An object containing a set of optional, user-specified environment variable key-value pairs. 在启动驱动程序和工作器时,(X,Y) 形式的键值对会按原样导出(即Key-value pair of the form (X,Y) are exported as is (i.e.,
export X='Y')。export X='Y') while launching the driver and workers.

若要额外指定一组 SPARK_DAEMON_JAVA_OPTS,建议将其追加到 $SPARK_DAEMON_JAVA_OPTS,如以下示例中所示。In order to specify an additional set of SPARK_DAEMON_JAVA_OPTS, we recommend appending them to $SPARK_DAEMON_JAVA_OPTS as shown in the following example. 这样可确保也包含所有默认的 databricks 托管环境变量。This ensures that all default databricks managed environmental variables are included as well.

Spark 环境变量示例:Example Spark environment variables:
{"SPARK_WORKER_MEMORY": "28000m", "SPARK_LOCAL_DIRS": "/local_disk0"}{"SPARK_WORKER_MEMORY": "28000m", "SPARK_LOCAL_DIRS": "/local_disk0"} or
{"SPARK_DAEMON_JAVA_OPTS": "$SPARK_DAEMON_JAVA_OPTS -Dspark.shuffle.service.enabled=true"}
enable_elastic_diskenable_elastic_disk BOOL 自动缩放本地存储:启用后,此群集在其 Spark 工作器磁盘空间不足时会动态获取更多磁盘空间。Autoscaling Local Storage: when enabled, this cluster dynamically acquires additional disk space when its Spark workers are running low on disk space. 有关详细信息,请参阅自动缩放本地存储Refer to Autoscaling local storage for details.
instance_pool_idinstance_pool_id STRING 群集所属的实例池的可选 ID。The optional ID of the instance pool to which the cluster belongs. 有关详细信息,请参阅实例池 APIRefer to Instance Pools API for details.

NotebookOutput NotebookOutput

字段名称Field Name 类型Type 描述Description
resultresult STRING 传递到 dbutils.notebook.exit() 的值。The value passed to dbutils.notebook.exit(). Azure Databricks 将此 API 限制为返回该值的前 1 MB。Azure Databricks restricts this API to return the first 1 MB of the value. 若要返回更大的结果,你的作业可以将结果存储在云存储服务中。For a larger result, your job can store the results in a cloud storage service. 如果从未调用 dbutils.notebook.exit(),此字段将不会存在。This field will be absent if dbutils.notebook.exit() was never called.
已截断truncated BOOLEAN 结果是否已截断。Whether or not the result was truncated.

NotebookTask NotebookTask

所有输出单元均受大小 8MB 的约束。All the output cells are subject to the size of 8MB. 如果单元的输出具有更大的大小,该运行的剩余部分将会被取消,并且该运行将会被标记为失败。If the output of a cell has a larger size, the rest of the run will be cancelled and the run will be marked as failed. 在这种情况下,可能还会缺少其他单元中的一些内容输出。In that case, some of the content output from other cells may also be missing. 如果需要帮助查找超出限制的单元,请针对通用群集运行该笔记本,并使用这项笔记本自动保存技术If you need help finding the cell that is beyond the limit, run the notebook against an all-purpose cluster and use this notebook autosave technique.

字段名称Field Name 类型Type 描述Description
notebook_pathnotebook_path STRING 要在 Azure Databricks 工作区中运行的笔记本的绝对路径。The absolute path of the notebook to be run in the Azure Databricks workspace. 此路径必须以斜杠开头。This path must begin with a slash. 此字段为必需字段。This field is required.
revision_timestamprevision_timestamp LONG 笔记本的修订的时间戳。The timestamp of the revision of the notebook.
base_parametersbase_parameters ParamPair 的映射A map of ParamPair 要用于此作业每个运行的基参数。Base parameters to be used for each run of this job. 如果该运行由指定了参数的 run-now 调用启动,将会合并这两个参数映射。If the run is initiated by a call to run-now with parameters specified, the two parameters maps will be merged. 如果在 base_parametersrun-now 中指定了相同密钥,将会使用来自 run-now 的值。If the same key is specified in base_parameters and in run-now, the value from run-now will be used.

如果笔记本采用未在作业的 base_parametersrun-now 重写参数中指定的参数,则将会使用笔记本中的默认值。If the notebook takes a parameter that is not specified in the job’s base_parameters or the run-now override parameters, the default value from the notebook will be used.

请使用 dbutils.widgets.get 来检索笔记本中的这些参数。Retrieve these parameters in a notebook using dbutils.widgets.get.

ParamPair ParamPair

基于名称的参数,用于运行笔记本任务的作业。Name-based parameters for jobs running notebook tasks.

重要

此数据结构中的字段只接受拉丁字符(ASCII 字符集)。The fields in this data structure accept only Latin characters (ASCII character set). 使用非 ASCII 字符将会返回错误。Using non-ASCII characters will return an error. 例如,中文、日文汉字和表情符号都属于无效的非 ASCII 字符。Examples of invalid, non-ASCII characters are Chinese, Japanese kanjis, and emojis.

类型Type 描述Description
STRING 参数名称。Parameter name. 传递到 dbutils.widgets.get 以检索值。Pass to dbutils.widgets.get to retrieve the value.
STRING 参数值。Parameter value.

运行 Run

有关运行的所有信息(其输出除外)。All the information about a run except for its output. 可以使用 getRunOutput 方法单独检索输出。The output can be retrieved separately with the getRunOutput method.

字段名称Field Name 类型Type 描述Description
job_idjob_id INT64 包含此运行的作业的规范标识符。The canonical identifier of the job that contains this run.
run_idrun_id INT64 该运行的规范标识符。The canonical identifier of the run. 此 ID 在所有作业的所有运行中都是唯一的。This ID is unique across all runs of all jobs.
creator_user_namecreator_user_name STRING 创建者用户名。The creator user name. 如果已将该用户删除,响应中将不会包含此字段。This field won’t be included in the response if the user has already been deleted.
number_in_jobnumber_in_job INT64 此运行在该作业的所有运行中的序列号。The sequence number of this run among all runs of the job. 此值从 1 开始。This value starts at 1.
original_attempt_run_idoriginal_attempt_run_id INT64 如果此运行是对之前某次运行尝试的重试,则此字段会包含原始尝试的 run_id;否则此字段与本次运行的 run_id 相同。If this run is a retry of a prior run attempt, this field contains the run_id of the original attempt; otherwise, it is the same as the run_id.
statestate RunStateRunState 运行的结果和生命周期状态。The result and lifecycle states of the run.
scheduleschedule CronScheduleCronSchedule 触发此运行的 cron 计划(如果此运行已由定期计划程序触发)。The cron schedule that triggered this run if it was triggered by the periodic scheduler.
tasktask JobTaskJobTask 由该运行执行的任务(如果有)。The task performed by the run, if any.
cluster_speccluster_spec ClusterSpecClusterSpec 创建此运行时作业的群集规范的快照。A snapshot of the job’s cluster specification when this run was created.
cluster_instancecluster_instance ClusterInstanceClusterInstance 用于此运行的群集。The cluster used for this run. 如果将该运行指定为使用新群集,则在作业服务已为该运行请求群集后,将会设置此字段。If the run is specified to use a new cluster, this field will be set once the Jobs service has requested a cluster for the run.
overriding_parametersoverriding_parameters RunParametersRunParameters 用于此运行的参数。The parameters used for this run.
start_timestart_time INT64 启动此运行的时间,以epoch 毫秒表示(自 UTC 1970 年 1 月 1 日起的毫秒数)。The time at which this run was started in epoch milliseconds (milliseconds since 1/1/1970 UTC). 此时间可能不是作业任务开始执行的时间,例如,如果该作业已计划在新群集上运行,则此时间是发出群集创建调用的时间。This may not be the time when the job task starts executing, for example, if the job is scheduled to run on a new cluster, this is the time the cluster creation call is issued.
setup_durationsetup_duration INT64 设置该群集所花费的时间(以毫秒为单位)。The time it took to set up the cluster in milliseconds. 对于在新群集上运行的运行,此时间是群集创建时间,对于在现有群集上运行的运行,此时间应该很短。For runs that run on new clusters this is the cluster creation time, for runs that run on existing clusters this time should be very short.
execution_durationexecution_duration INT64 执行 JAR 或笔记本中的命令(直到这些命令已完成、失败、超时、被取消或遇到意外错误)所花费的时间(以毫秒为单位)。The time in milliseconds it took to execute the commands in the JAR or notebook until they completed, failed, timed out, were cancelled, or encountered an unexpected error.
cleanup_durationcleanup_duration INT64 终止该群集并清理任何中间结果等等所花费的时间(以毫秒为单位)。该运行的总持续时间为 setup_duration、execution_duration 以及 cleanup_duration 之和。The time in milliseconds it took to terminate the cluster and clean up any intermediary results, etc. The total duration of the run is the sum of the setup_duration, the execution_duration, and the cleanup_duration.
触发器trigger TriggerTypeTriggerType 触发此运行的触发器的类型,例如,定期计划或一次性运行。The type of trigger that fired this run, e.g., a periodic schedule or a one time run.
run_namerun_name STRING 可选的作业名称。An optional name for the run. 默认值是 UntitledThe default value is Untitled. 允许的最大长度为采用 UTF-8 编码的 4096 个字节。The maximum allowed length is 4096 bytes in UTF-8 encoding.
run_page_urlrun_page_url STRING 运行的详细信息页的 URL。The URL to the detail page of the run.
run_typerun_type STRING 运行的类型。The type of the run.

* JOB_RUN - 正常的作业运行。* JOB_RUN - Normal job run. 使用立即运行创建的运行。A run created with Run now.
* WORKFLOW_RUN - 工作流运行。* WORKFLOW_RUN - Workflow run. 使用 dbutils.notebook.run 创建的运行。A run created with dbutils.notebook.run.
* SUBMIT_RUN - 提交运行。* SUBMIT_RUN - Submit run. 使用立即运行创建的运行。A run created with Run now.

RunParameters RunParameters

适用于此运行的参数。Parameters for this run. run-now 请求中,只应根据作业任务类型的不同指定 jar_params、python_params 或 notebook_params 之一。Only one of jar_params, python_params, or notebook_params should be specified in the run-now request, depending on the type of job task. 具有 Spark JAR 任务或 Python 任务的作业采用基于位置的参数的列表,而具有笔记本任务的作业则采用键值映射。Jobs with Spark JAR task or Python task take a list of position-based parameters, and jobs with notebook tasks take a key value map.

字段名称Field Name 类型Type 描述Description
jar_paramsjar_params 一个由 STRING 构成的数组An array of STRING 具有 Spark JAR 任务的作业的参数的列表,例如 "jar_params": ["john doe", "35"]A list of parameters for jobs with Spark JAR tasks, e.g. "jar_params": ["john doe", "35"]. 这些参数将用于调用 Spark JAR 任务中指定的 main 类的 main 函数。The parameters will be used to invoke the main function of the main class specified in the Spark JAR task. 如果未在调用 run-now 时指定,该项将默认为空列表。If not specified upon run-now, it will default to an empty list. jar_params 不能与 notebook_params 一起指定。jar_params cannot be specified in conjunction with notebook_params. 此字段的 JSON 表示形式(即 {"jar_params":["john doe","35"]})不能超过 10,000 字节。The JSON representation of this field (i.e. {"jar_params":["john doe","35"]}) cannot exceed 10,000 bytes.
notebook_paramsnotebook_params ParamPair 的映射A map of ParamPair 具有笔记本任务的作业的从键到值的映射,例如A map from keys to values for jobs with notebook task, e.g.
"notebook_params": {"name": "john doe", "age": "35"}."notebook_params": {"name": "john doe", "age": "35"}. 该映射会传递到笔记本,并且可通过 dbutils.widgets.get 函数访问。The map is passed to the notebook and is accessible through the dbutils.widgets.get function.

如果未在调用 run-now 时指定,已触发的运行会使用该作业的基参数。If not specified upon run-now, the triggered run uses the job’s base parameters.

notebook_params 不能与 jar_params 一起指定。notebook_params cannot be specified in conjunction with jar_params.

此字段的 JSON 表示形式(即The JSON representation of this field (i.e.
{"notebook_params":{"name":"john doe","age":"35"}})不能超过 10,000 字节。{"notebook_params":{"name":"john doe","age":"35"}}) cannot exceed 10,000 bytes.
python_paramspython_params 一个由 STRING 构成的数组An array of STRING 具有 Python 任务的作业的参数列表,例如 "python_params": ["john doe", "35"]A list of parameters for jobs with Python tasks, e.g. "python_params": ["john doe", "35"]. 这些参数会作为命令行参数传递到 Python 文件。The parameters are passed to Python file as command-line parameters. 如果在调用 run-now 时指定,则此项会覆盖作业设置中指定的参数。If specified upon run-now, it would overwrite the parameters specified in job setting. 此字段的 JSON 表示形式(即 {"python_params":["john doe","35"]})不能超过 10,000 字节。The JSON representation of this field (i.e. {"python_params":["john doe","35"]}) cannot exceed 10,000 bytes.

> [!IMPORTANT] > > 这些参数只接受拉丁字符(ASCII 字符集)。> [!IMPORTANT] > > These parameters accept only Latin characters (ASCII character set). > 使用非 ASCII 字符将会返回错误。> Using non-ASCII characters will return an error. 例如,中文、日文汉字和表情符号 > 都属于无效的非 ASCII 字符。Examples of invalid, non-ASCII characters are > Chinese, Japanese kanjis, and emojis.
spark_submit_paramsspark_submit_params 一个由 STRING 构成的数组An array of STRING 具有“Spark 提交”任务的作业的参数列表,例如A list of parameters for jobs with spark submit task, e.g.
"spark_submit_params": ["--class", "org.apache.spark.examples.SparkPi"]."spark_submit_params": ["--class", "org.apache.spark.examples.SparkPi"]. 这些参数会作为命令行参数传递到 spark-submit 脚本。The parameters are passed to spark-submit script as command-line parameters. 如果在调用 run-now 时指定,则此项会覆盖作业设置中指定的参数。If specified upon run-now, it would overwrite the parameters specified in job setting. 此字段的 JSON 表示形式(即 {"python_params":["john doe","35"]})不能超过 10,000 字节。The JSON representation of this field (i.e. {"python_params":["john doe","35"]}) cannot exceed 10,000 bytes.

> [!IMPORTANT] > > 这些参数只接受拉丁字符(ASCII 字符集)。> [!IMPORTANT] > > These parameters accept only Latin characters (ASCII character set). > 使用非 ASCII 字符将会返回错误。> Using non-ASCII characters will return an error. 例如,中文、日文汉字和表情符号 > 都属于无效的非 ASCII 字符。Examples of invalid, non-ASCII characters are > Chinese, Japanese kanjis, and emojis.

RunState RunState

字段名称Field Name 类型Type 描述Description
life_cycle_statelife_cycle_state RunLifeCycleStateRunLifeCycleState 运行在运行生命周期中当前所处位置的说明。A description of a run’s current location in the run lifecycle. 此字段在响应中始终可用。This field is always available in the response.
result_stateresult_state RunResultStateRunResultState 运行的结果状态。The result state of a run. 如果此字段不可用,响应将不会包含此字段。If it is not available, the response won’t include this field. 有关 result_state 可用性的详细信息,请参阅 RunResultStateSee RunResultState for details about the availability of result_state.
state_messagestate_message STRING 当前状态的描述性消息。A descriptive message for the current state. 此字段是非结构化的,且其确切格式随时可能发生变更。This field is unstructured, and its exact format is subject to change.

SparkJarTask SparkJarTask

字段名称Field Name 类型Type 描述Description
jar_urijar_uri STRING 自 2016 年 4 月起已弃用。Deprecated since 04/2016. 改为通过 libraries 字段提供 jarProvide a jar through the libraries field instead. 有关示例,请参阅创建For an example, see Create.
main_class_namemain_class_name STRING 类的全名,包含要执行的主要方法。The full name of the class containing the main method to be executed. 此类必须包含在作为库提供的 JAR 中。This class must be contained in a JAR provided as a library.

该代码应使用 SparkContext.getOrCreate 来获取 Spark 上下文;否则,作业的运行将会失败。The code should use SparkContext.getOrCreate to obtain a Spark context; otherwise, runs of the job will fail.
parametersparameters 一个由 STRING 构成的数组An array of STRING 传递给 main 方法的参数。Parameters passed to the main method.

SparkPythonTask SparkPythonTask

字段名称Field Name 类型Type 描述Description
python_filepython_file STRING 要执行的 Python 文件的 URI。The URI of the Python file to be executed. 支持 DBFS 路径。DBFS paths are supported. 此字段为必需字段。This field is required.
parametersparameters 一个由 STRING 构成的数组An array of STRING 传递到 Python 文件的命令行参数。Command line parameters passed to the Python file.

SparkSubmitTask SparkSubmitTask

重要

  • 只能对新群集调用“Spark 提交”任务。You can invoke Spark submit tasks only on new clusters.
  • 在 new_cluster 规范中,不支持 librariesspark_confIn the new_cluster specification, libraries and spark_conf are not supported. 请改为使用 --jars--py-files 来添加 Java 和 Python 库,使用 --conf 来设置 Spark 配置。Instead, use --jars and --py-files to add Java and Python libraries and --conf to set the Spark configuration.
  • masterdeploy-modeexecutor-cores 由 Azure Databricks 自动配置;你无法在参数中指定它们。master, deploy-mode, and executor-cores are automatically configured by Azure Databricks; you cannot specify them in parameters.
  • 默认情况下,“Spark 提交”作业使用所有可用内存(为 Azure Databricks 服务保留的内存除外)。By default, the Spark submit job uses all available memory (excluding reserved memory for Azure Databricks services). 可将 --driver-memory--executor-memory 设置为较小的值,以留出一些空间作为堆外内存。You can set --driver-memory, and --executor-memory to a smaller value to leave some room for off-heap usage.
  • --jars--py-files--files 参数支持 DBFS 路径。The --jars, --py-files, --files arguments support DBFS paths.

例如,假定 JAR 上传到了 DBFS,则可通过设置以下参数来运行 SparkPiFor example, assuming the JAR is uploaded to DBFS, you can run SparkPi by setting the following parameters.

{
  "parameters": [
    "--class",
    "org.apache.spark.examples.SparkPi",
    "dbfs:/path/to/examples.jar",
    "10"
  ]
}
字段名称Field Name 类型Type 描述Description
parametersparameters 一个由 STRING 构成的数组An array of STRING 传递到 spark-submit 的命令行参数。Command-line parameters passed to spark submit.

ViewItem ViewItem

导出的内容采用 HTML 格式。The exported content is in HTML format. 例如,如果要导出的视图是仪表板,则会为每个仪表板返回一个 HTML 字符串。For example, if the view to export is dashboards, one HTML string is returned for every dashboard.

字段名称Field Name 类型Type 描述Description
内容content STRING 视图的内容。Content of the view.
namename STRING 视图项的名称。Name of the view item. 对于代码视图,该项会是笔记本的名称。In the case of code view, it would be the notebook’s name. 对于仪表板视图,该项会是仪表板的名称。In the case of dashboard view, it would be the dashboard’s name.
typetype ViewTypeViewType 视图项的类型。Type of the view item.

RunLifeCycleState RunLifeCycleState

运行的生命周期状态。The life cycle state of a run. 允许的状态转换为:Allowed state transitions are:

  • PENDING -> RUNNING -> TERMINATING -> TERMINATED
  • PENDING -> SKIPPED
  • PENDING -> INTERNAL_ERROR
  • RUNNING -> INTERNAL_ERROR
  • TERMINATING -> INTERNAL_ERROR
状态State 描述Description
PENDING 已触发该运行。The run has been triggered. 如果还没有相同作业的活动运行,则会准备群集和执行上下文。If there is not already an active run of the same job, the cluster and execution context are being prepared. 如果已经有相同作业的活动运行,该运行将会立即转换为 SKIPPED 状态,而不会再准备任何资源。If there is already an active run of the same job, the run will immediately transition into the SKIPPED state without preparing any resources.
RUNNING 正在执行此运行的任务。The task of this run is being executed.
TERMINATING 此运行的任务已经完成,正在清理群集和执行上下文。The task of this run has completed, and the cluster and execution context are being cleaned up.
TERMINATED 此运行的任务已经完成,已经清理了群集和执行上下文。The task of this run has completed, and the cluster and execution context have been cleaned up. 此状态为最终状态。This state is terminal.
SKIPPED 已中止此运行,因为同一作业以前的运行已处于活动状态。This run was aborted because a previous run of the same job was already active. 此状态为最终状态。This state is terminal.
INTERNAL_ERROR 一个异常状态,指示作业服务中存在故障(如长时间的网络故障)。An exceptional state that indicates a failure in the Jobs service, such as network failure over a long period. 如果新群集上的运行以 INTERNAL_ERROR 状态结束,作业服务会尽快终止该群集。If a run on a new cluster ends in the INTERNAL_ERROR state, the Jobs service terminates the cluster as soon as possible. 此状态为最终状态。This state is terminal.

RunResultState RunResultState

运行的结果状态。The result state of the run.

  • 如果 life_cycle_state = TERMINATED:在该运行有任务的情况下,一定会提供结果,并且该状态指示任务的结果。If life_cycle_state = TERMINATED: if the run had a task, the result is guaranteed to be available, and it indicates the result of the task.
  • 如果 life_cycle_state = PENDINGRUNNINGSKIPPED,则不会提供结果状态。If life_cycle_state = PENDING, RUNNING, or SKIPPED, the result state is not available.
  • 如果 life_cycle_state = TERMINATING 或 lifecyclestate = INTERNAL_ERROR:在该运行有任务并已成功启动该任务的情况下,将会提供结果状态。If life_cycle_state = TERMINATING or lifecyclestate = INTERNAL_ERROR: the result state is available if the run had a task and managed to start it.

结果状态在提供之后不会再改变。Once available, the result state never changes.

状态State 描述Description
成功SUCCESS 已成功完成任务。The task completed successfully.
FAILEDFAILED 任务已完成,但有错误。The task completed with an error.
TIMEDOUTTIMEDOUT 该运行在达到超时后已停止。The run was stopped after reaching the timeout.
已取消CANCELED 该运行已应用户请求而取消。The run was canceled at user request.

TriggerType TriggerType

这些是可以触发运行的触发器的类型。These are the type of triggers that can fire a run.

类型Type 描述Description
PERIODICPERIODIC 定期触发运行的计划,如 cron 计划程序。Schedules that periodically trigger runs, such as a cron scheduler.
ONE_TIMEONE_TIME 触发单个运行的一次性触发器。One time triggers that fire a single run. 这在你通过 UI 或 API 按需触发单个运行时发生。This occurs you triggered a single run on demand through the UI or the API.
重试RETRY 指示一个作为先前失败的运行的重试而被触发的运行。Indicates a run that is triggered as a retry of a previously failed run. 如果在出现故障时请求重新运行作业,则会发生这种触发。This occurs when you request to re-run the job in case of failures.

ViewType ViewType

类型Type 描述Description
NOTEBOOKNOTEBOOK 笔记本视图项。Notebook view item.
仪表板DASHBOARD 仪表板视图项。Dashboard view item.

ViewsToExport ViewsToExport

要导出的视图:代码、所有仪表板,或全部。View to export: either code, all dashboards, or all.

类型Type 描述Description
CODECODE 笔记本的代码视图。Code view of the notebook.
仪表板DASHBOARDS 笔记本的所有仪表板视图。All dashboard views of the notebook.
ALLALL 笔记本的所有视图。All views of the notebook.