Jobs API
The Jobs API allows you to create, edit, and delete jobs. The maximum allowed size of a request to the Jobs API is 10MB. See Jobs API examples for a how-to guide on this API.
Note
If you receive a 500-level error when making Jobs API requests, we recommend retrying requests for up to 10 min (with a minimum 30 second interval between retries).
Create
Endpoint | HTTP Method |
---|---|
2.0/jobs/create |
POST |
Create a new job.
An example request for a job that runs at 10:15pm each night:
{
"name": "Nightly model training",
"new_cluster": {
"spark_version": "5.3.x-scala2.11",
"node_type_id": "Standard_D3_v2",
"num_workers": 10
},
"libraries": [
{
"jar": "dbfs:/my-jar.jar"
},
{
"maven": {
"coordinates": "org.jsoup:jsoup:1.7.2"
}
}
],
"timeout_seconds": 3600,
"max_retries": 1,
"schedule": {
"quartz_cron_expression": "0 15 22 ? * *",
"timezone_id": "America/Los_Angeles"
},
"spark_jar_task": {
"main_class_name": "com.databricks.ComputeModels"
}
}
And response:
{
"job_id": 1
}
Request Structure
Create a new job.
Important
- When you run a job on a new automated cluster, the job is treated as a data engineering (automated) workload subject to automated workload pricing.
- When you run a job on an existing interactive cluster, it is treated as a data analytics (interactive) workload subject to interactive workload pricing.
Field Name | Type | Description |
---|---|---|
existing_cluster_id OR new_cluster |
STRING OR NewCluster |
If existing_cluster_id , the ID of an existing cluster that will be used for all runs of this job. When running jobs on an existing cluster, you may need to manually restart the cluster if it stops responding. We suggest running jobs on new clusters for greater reliability.If new_cluster , a description of a cluster that will be created for each run. |
notebook_task OR spark_jar_task ORspark_python_task OR spark_submit_task |
NotebookTask OR SparkJarTask OR SparkPythonTask OR SparkSubmitTask | If notebook_task , indicates that this job should run a notebook. This field may not be specified in conjunction with spark_jar_task .If spark_jar_task , indicates that this job should run a JAR.If spark_python_task , indicates that this job should run a Python file.If spark_submit_task , indicates that this job should run spark submit script. |
name | STRING |
An optional name for the job. The default value is Untitled . |
libraries | An array of Library | An optional list of libraries to be installed on the cluster that will execute the job. The default value is an empty list. |
email_notifications | JobEmailNotifications | An optional set of email addresses notified when runs of this job begin and complete and when this job is deleted. The default behavior is to not send any emails. |
timeout_seconds | INT32 |
An optional timeout applied to each run of this job. The default behavior is to have no timeout. |
max_retries | INT32 |
An optional maximum number of times to retry an unsuccessful run. A run is considered to be unsuccessful if it completes with a FAILED result_state orINTERNAL_ERROR life_cycle_state . The value -1 means to retry indefinitely and the value 0 means to never retry. The default behavior is to never retry. |
min_retry_interval_millis | INT32 |
An optional minimal interval in milliseconds between the start of the failed run and the subsequent retry run. The default behavior is that unsuccessful runs are immediately retried. |
retry_on_timeout | BOOL |
An optional policy to specify whether to retry a job when it times out. The default behavior is to not retry on timeout. |
schedule | CronSchedule | An optional periodic schedule for this job. The default behavior is that the job runs when triggered by clicking Run Now in the Jobs UI or sending an API request to runNow . |
max_concurrent_runs | INT32 |
An optional maximum allowed number of concurrent runs of the job. Set this value if you want to be able to execute multiple runs of the same job concurrently. This is useful for example if you trigger your job on a frequent schedule and want to allow consecutive runs to overlap with each other, or if you want to trigger multiple runs which differ by their input parameters. This setting affects only new runs. For example, suppose the job’s concurrency is 4 and there are 4 concurrent active runs. Then setting the concurrency to 3 won’t kill any of the active runs. However, from then on, new runs are skipped unless there are fewer than 3 active runs. This value cannot exceed 150. Setting this value to 0 causes all new runs to be skipped. The default behavior is to allow only 1 concurrent run. |
Response Structure
Field Name | Type | Description |
---|---|---|
job_id | INT64 |
The canonical identifier for the newly created job. |
List
Endpoint | HTTP Method |
---|---|
2.0/jobs/list |
GET |
List all jobs. An example response:
{
"jobs": [
{
"job_id": 1,
"settings": {
"name": "Nightly model training",
"new_cluster": {
"spark_version": "5.3.x-scala2.11",
"node_type_id": "Standard_D3_v2",
"num_workers": 10
},
"libraries": [
{
"jar": "dbfs:/my-jar.jar"
},
{
"maven": {
"coordinates": "org.jsoup:jsoup:1.7.2"
}
}
],
"timeout_seconds": 100000000,
"max_retries": 1,
"schedule": {
"quartz_cron_expression": "0 15 22 ? * *",
"timezone_id": "America/Los_Angeles"
},
"spark_jar_task": {
"main_class_name": "com.databricks.ComputeModels"
}
},
"created_time": 1457570074236
}
]
}
Response Structure
Field Name | Type | Description |
---|---|---|
jobs | An array of Job | The list of jobs. |
Delete
Endpoint | HTTP Method |
---|---|
2.0/jobs/delete |
POST |
Delete the job and send an email to the addresses specified in JobSettings.email_notifications
.
No action occurs if the job has already been removed. After the job is removed, neither its
details or its run history is visible via the Jobs UI or API. The job is guaranteed to
be removed upon completion of this request. However, runs that were active before the receipt
of this request may still be active. They will be terminated asynchronously.
An example request:
{
"job_id": 1
}
Request Structure
Delete a job and send an email to the addresses specified in
JobSettings.email_notifications
.
No action occurs if the job has already been removed. After the job is removed, neither its
details nor its run history is visible via the Jobs UI or API. The job is guaranteed to
be removed upon completion of this request. However, runs that were active before the receipt of
this request may still be active. They will be terminated asynchronously.
Field Name | Type | Description |
---|---|---|
job_id | INT64 |
The canonical identifier of the job to delete. This field is required. |
Get
Endpoint | HTTP Method |
---|---|
2.0/jobs/get |
GET |
Retrieves information about a single job. An example request:
/jobs/get?job_id=1
An example response:
{
"job_id": 1,
"settings": {
"name": "Nightly model training",
"new_cluster": {
"spark_version": "5.3.x-scala2.11",
"node_type_id": "Standard_D3_v2",
"num_workers": 10
},
"libraries": [
{
"jar": "dbfs:/my-jar.jar"
},
{
"maven": {
"coordinates": "org.jsoup:jsoup:1.7.2"
}
}
],
"timeout_seconds": 100000000,
"max_retries": 1,
"schedule": {
"quartz_cron_expression": "0 15 22 ? * *",
"timezone_id": "America/Los_Angeles"
},
"spark_jar_task": {
"main_class_name": "com.databricks.ComputeModels"
}
},
"created_time": 1457570074236
}
Request Structure
Retrieve information about a single job.
Field Name | Type | Description |
---|---|---|
job_id | INT64 |
The canonical identifier of the job to retrieve information about. This field is required. |
Response Structure
Field Name | Type | Description |
---|---|---|
job_id | INT64 |
The canonical identifier for this job. |
creator_user_name | STRING |
The creator user name. This field won’t be included in the response if the user has already been deleted. |
settings | JobSettings | Settings for this job and all of its runs. These settings can be updated using the resetJob method. |
created_time | INT64 |
The time at which this job was created in epoch milliseconds (milliseconds since 1/1/1970 UTC). |
Reset
Endpoint | HTTP Method |
---|---|
2.0/jobs/reset |
POST |
Overwrite job settings.
An example request that makes job 2 look like job 1 (from the create_job
example):
{
"job_id": 2,
"new_settings": {
"name": "Nightly model training",
"new_cluster": {
"spark_version": "5.3.x-scala2.11",
"node_type_id": "Standard_D3_v2",
"num_workers": 10
},
"libraries": [
{
"jar": "dbfs:/my-jar.jar"
},
{
"maven": {
"coordinates": "org.jsoup:jsoup:1.7.2"
}
}
],
"timeout_seconds": 100000000,
"max_retries": 1,
"schedule": {
"quartz_cron_expression": "0 15 22 ? * *",
"timezone_id": "America/Los_Angeles"
},
"spark_jar_task": {
"main_class_name": "com.databricks.ComputeModels"
}
}
}
Request Structure
Overwrite job settings.
Field Name | Type | Description |
---|---|---|
job_id | INT64 |
The canonical identifier of the job to reset. This field is required. |
new_settings | JobSettings | The new settings of the job. These new settings replace the old settings entirely. Changes to the following fields are not applied to active runs: JobSettings.cluster_spec or JobSettings.task .Changes to the following fields are applied to active runs as well as future runs: JobSettings.timeout_second , JobSettings.email_notifications , orJobSettings.retry_policy . This field is required. |
Run Now
Important
- The number of jobs is limited to 1000.
- The number of jobs a workspace can create in an hour is limited to 5000 (includes “run now” and “runs submit”). This limit also affects jobs created by the REST API and notebook workflows.
- The number of actively concurrent runs a workspace can create is limited to 150.
Endpoint | HTTP Method |
---|---|
2.0/jobs/run-now |
POST |
Run a job now and return the run_id
of the triggered run.
Note
If you find yourself using Create together with Run Now a lot, you may actually be interested in the Runs Submit API. This API endpoint allows you to submit your workloads directly without having to create a job in Azure Databricks.
An example request for a notebook job:
{
"job_id": 1,
"notebook_params": {
"dry-run": "true",
"oldest-time-to-consider": "1457570074236"
}
}
An example request for a JAR job:
{
"job_id": 2,
"jar_params": ["param1", "param2"]
}
Request Structure
Run a job now and return the run_id
of the triggered run.
Field Name | Type | Description |
---|---|---|
job_id | INT64 |
|
jar_params | An array of STRING |
A list of parameters for jobs with JAR tasks, e.g. "jar_params": ["john doe", "35"] . The parameters will be used to invoke the main function of the main class specified in the Spark JAR task. If not specified upon run-now , it will default to an empty list.jar_params cannot be specified in conjunction with notebook_params . The JSON representation of this field (i.e. {"jar_params":["john doe","35"]} ) cannot exceed 10,000 bytes. |
notebook_params | A map of ParamPair | A map from keys to values for jobs with notebook task, e.g."notebook_params": {"name": "john doe", "age": "35"} . The map is passed to the notebook and will be accessible through the dbutils.widgets.get function. See Widgets for more information.If not specified upon run-now , the triggered run uses the job’s base parameters.notebook_params cannot be specified in conjunction with jar_params .The JSON representation of this field (i.e. {"notebook_params":{"name":"john doe","age":"35"}} ) cannot exceed 10,000 bytes. |
python_params | An array of STRING |
A list of parameters for jobs with Python tasks, e.g. "python_params": ["john doe", "35"] . The parameters will be passed to Python file as command-line parameters. If specified upon run-now , it would overwrite the parameters specified in job setting. The JSON representation of this field (i.e. {"python_params":["john doe","35"]} ) cannot exceed 10,000 bytes. |
spark_submit_params | An array of STRING |
A list of parameters for jobs with spark submit task, e.g."spark_submit_params": ["--class", "org.apache.spark.examples.SparkPi"] . The parameters will be passed to spark-submit script as command-line parameters. If specified upon run-now , it would overwrite the parameters specified in job setting. The JSON representation of this field cannot exceed 10,000 bytes. |
Response Structure
Field Name | Type | Description |
---|---|---|
run_id | INT64 |
The globally unique ID of the newly triggered run. |
number_in_job | INT64 |
The sequence number of this run among all runs of the job. |
Runs Submit
Important
- The number of jobs is limited to 1000.
- The number of jobs a workspace can create in an hour is limited to 5000 (includes “run now” and “runs submit”). This limit also affects jobs created by the REST API and notebook workflows.
- The number of actively concurrent runs a workspace can create is limited to 150.
Endpoint | HTTP Method |
---|---|
2.0/jobs/runs/submit |
POST |
Submit a one-time run. This endpoint doesn’t require a Databricks job
to be created. You can directly submit your workload. Runs submitted via this endpoint don’t
display in the UI. Once the run is submitted, use the jobs/runs/get
API to check the run state.
An example request:
{
"run_name": "my spark task",
"new_cluster": {
"spark_version": "5.3.x-scala2.11",
"node_type_id": "Standard_D3_v2",
"num_workers": 10
},
"libraries": [
{
"jar": "dbfs:/my-jar.jar"
},
{
"maven": {
"coordinates": "org.jsoup:jsoup:1.7.2"
}
}
],
"spark_jar_task": {
"main_class_name": "com.databricks.ComputeModels"
}
}
And response:
{
"run_id": 123
}
Request Structure
Submit a new run with the provided settings.
Important
- When you run a job on a new automated cluster, the job is treated as a data engineering (automated) workload subject to automated workload pricing.
- When you run a job on an existing interactive cluster, it is treated as a data analytics (interactive) workload subject to interactive workload pricing.
Field Name | Type | Description |
---|---|---|
existing_cluster_id OR new_cluster |
STRING OR NewCluster |
If existing_cluster_id , the ID of an existing cluster that will be used for all runs of this job. When running jobs on an existing cluster, you may need to manually restart the cluster if it stops responding. We suggest running jobs on new clusters for greater reliability.If new_cluster , a description of a cluster that will be created for each run. |
notebook_task OR spark_jar_task ORspark_python_task OR spark_submit_task |
NotebookTask OR SparkJarTask OR SparkPythonTask OR SparkSubmitTask | If notebook_task , indicates that this job should run a notebook. This field may not be specified in conjunction with spark_jar_task .If spark_jar_task , indicates that this job should run a JAR.If spark_python_task , indicates that this job should run a Python file.If spark_submit_task , indicates that this job should run spark submit script. |
run_name | STRING |
An optional name for the run. The default value is Untitled . |
libraries | An array of Library | An optional list of libraries to be installed on the cluster that will execute the job. The default value is an empty list. |
timeout_seconds | INT32 |
An optional timeout applied to each run of this job. The default behavior is to have no timeout. |
Response Structure
Field Name | Type | Description |
---|---|---|
run_id | INT64 |
The canonical identifier for the newly submitted run. |
Runs List
Endpoint | HTTP Method |
---|---|
2.0/jobs/runs/list |
GET |
List runs from most recently started to least.
Note
Runs are automatically removed after 60 days. If you to want to reference them beyond 60 days, you should save old run results before they expire. To export using the UI, see Export job run results. To export using the Job API, see Runs Export.
An example request:
/jobs/runs/list?job_id=1&active_only=false&offset=1&limit=1
And response:
{
"runs": [
{
"job_id": 1,
"run_id": 452,
"number_in_job": 5,
"state": {
"life_cycle_state": "RUNNING",
"state_message": "Performing action"
},
"task": {
"notebook_task": {
"notebook_path": "/Users/donald@duck.com/my-notebook"
}
},
"cluster_spec": {
"existing_cluster_id": "1201-my-cluster"
},
"cluster_instance": {
"cluster_id": "1201-my-cluster",
"spark_context_id": "1102398-spark-context-id"
},
"overriding_parameters": {
"jar_params": ["param1", "param2"]
},
"start_time": 1457570074236,
"setup_duration": 259754,
"execution_duration": 3589020,
"cleanup_duration": 31038,
"trigger": "PERIODIC"
}
],
"has_more": true
}
Request Structure
List runs from most recently started to least.
Field Name | Type | Description |
---|---|---|
active_only OR completed_only |
BOOL OR BOOL |
If active_only , if true, only active runs will be included in the results; otherwise, lists both active and completed runs.Note: This field cannot be true when completed_only is true. If completed_only , if true, only completed runs will be included in the results; otherwise, lists both active and completed runs.Note: This field cannot be true when active_only is true. |
job_id | INT64 |
The job for which to list runs. If omitted, the Jobs service will list runs from all jobs. |
offset | INT32 |
The offset of the first run to return, relative to the most recent run. |
limit | INT32 |
The number of runs to return. This value should be greater than 0 and less than 150. The default value is 20. If a request specifies a limit of 0, the service will instead use the maximum limit. |
Response Structure
Field Name | Type | Description |
---|---|---|
runs | An array of Run | A list of runs, from most recently started to least. |
has_more | BOOL |
If true, additional runs matching the provided filter are available for listing. |
Runs Get
Endpoint | HTTP Method |
---|---|
2.0/jobs/runs/get |
GET |
Retrieve the metadata of a run.
Note
Runs are automatically removed after 60 days. If you to want to reference them beyond 60 days, you should save old run results before they expire. To export using the UI, see Export job run results. To export using the Job API, see Runs Export.
An example request:
/jobs/runs/get?run_id=452
An example response:
{
"job_id": 1,
"run_id": 452,
"number_in_job": 5,
"state": {
"life_cycle_state": "RUNNING",
"state_message": "Performing action"
},
"task": {
"notebook_task": {
"notebook_path": "/Users/donald@duck.com/my-notebook"
}
},
"cluster_spec": {
"existing_cluster_id": "1201-my-cluster"
},
"cluster_instance": {
"cluster_id": "1201-my-cluster",
"spark_context_id": "1102398-spark-context-id"
},
"overriding_parameters": {
"jar_params": ["param1", "param2"]
},
"start_time": 1457570074236,
"setup_duration": 259754,
"execution_duration": 3589020,
"cleanup_duration": 31038,
"trigger": "PERIODIC"
}
Request Structure
Retrieve the metadata of a run without any output.
Field Name | Type | Description |
---|---|---|
run_id | INT64 |
The canonical identifier of the run for which to retrieve the metadata. This field is required. |
Response Structure
Field Name | Type | Description |
---|---|---|
job_id | INT64 |
The canonical identifier of the job that contains this run. |
run_id | INT64 |
The canonical identifier of the run. This ID is unique across all runs of all jobs. |
number_in_job | INT64 |
The sequence number of this run among all runs of the job. This value starts at 1. |
original_attempt_run_id | INT64 |
If this run is a retry of a prior run attempt, this field contains the run_id of the original attempt; otherwise, it is the same as the run_id . |
state | RunState | The result and lifecycle states of the run. |
schedule | CronSchedule | The cron schedule that triggered this run if it was triggered by the periodic scheduler. |
task | JobTask | The task performed by the run, if any. |
cluster_spec | ClusterSpec | A snapshot of the job’s cluster specification when this run was created. |
cluster_instance | ClusterInstance | The cluster used for this run. If the run is specified to use a new cluster, this field will be set once the Jobs service has requested a cluster for the run. |
overriding_parameters | RunParameters | The parameters used for this run. |
start_time | INT64 |
The time at which this run was started in epoch milliseconds (milliseconds since 1/1/1970 UTC). This may not be the time when the job task starts executing, for example, if the job is scheduled to run on a new cluster, this is the time the cluster creation call is issued. |
setup_duration | INT64 |
The time it took to set up the cluster in milliseconds. For runs that run on new clusters this is the cluster creation time, for runs that run on existing clusters this time should be very short. |
execution_duration | INT64 |
The time in milliseconds it took to execute the commands in the JAR or notebook until they completed, failed, timed out, were cancelled, or encountered an unexpected error. |
cleanup_duration | INT64 |
The time in milliseconds it took to terminate the cluster and clean up any intermediary results, etc. The total duration of the run is the sum of the setup_duration , theexecution_duration , and the cleanup_duration . |
trigger | TriggerType | The type of trigger that fired this run, e.g., a periodic schedule or a one time run. |
creator_user_name | STRING |
The creator user name. This field won’t be included in the response if the user has already been deleted. |
run_page_url | STRING |
The URL to the detail page of the run. |
Runs Export
Endpoint | HTTP Method |
---|---|
2.0/jobs/runs/export |
GET |
Export and retrieve the job run task.
Note
Only notebook runs can be exported in HTML format. Exporting other runs of other types will fail.
An example request:
/jobs/runs/export?run_id=452
An example response:
{
"views": [ {
"content": "<!DOCTYPE html><html><head>Head</head><body>Body</body></html>",
"name": "my-notebook",
"type": "NOTEBOOK"
} ]
}
To extract the HTML notebook from the JSON response, download and run this Python script.
Note
The notebook body in the __DATABRICKS_NOTEBOOK_MODEL
object is encoded.
Request Structure
Retrieve the export of a job run task.
Field Name | Type | Description |
---|---|---|
run_id | INT64 |
The canonical identifier for the run. This field is required. |
views_to_export | ViewsToExport | Which views to export (CODE, DASHBOARDS, or ALL). Defaults to CODE. |
Response Structure
Field Name | Type | Description |
---|---|---|
views | An array of ViewItem | The exported content in HTML format (one for every view item). |
Runs Cancel
Endpoint | HTTP Method |
---|---|
2.0/jobs/runs/cancel |
POST |
Cancel a run. The run is canceled asynchronously, so when this request completes, the run may
still be running. The run will be terminated shortly. If the run is already in a
terminal life_cycle_state
, this method is a no-op.
An example request:
{
"run_id": 453
}
Request Structure
Cancel a run. The run is canceled asynchronously, so when this request completes the run may be still be active. The run will be terminated as soon as possible.
Field Name | Type | Description |
---|---|---|
run_id | INT64 |
This field is required. |
Runs Get Output
Endpoint | HTTP Method |
---|---|
2.0/jobs/runs/get-output |
GET |
Retrieve the output of a run. When a notebook task returns a value through the dbutils.notebook.exit() call, you can use this endpoint to retrieve that value. Azure Databricks restricts this API to return the first 5 MB of the output. For returning a larger result, you can store job results in a cloud storage service.
Runs are automatically removed after 60 days. If you to want to reference them beyond 60 days, you should save old run results before they expire. To export using the UI, see Export job run results. To export using the Job API, see Runs Export.
An example request:
/jobs/runs/get-output?run_id=453
And response:
{
"metadata": {
"job_id": 1,
"run_id": 452,
"number_in_job": 5,
"state": {
"life_cycle_state": "TERMINATED",
"result_state": "SUCCESS",
"state_message": ""
},
"task": {
"notebook_task": {
"notebook_path": "/Users/donald@duck.com/my-notebook"
}
},
"cluster_spec": {
"existing_cluster_id": "1201-my-cluster"
},
"cluster_instance": {
"cluster_id": "1201-my-cluster",
"spark_context_id": "1102398-spark-context-id"
},
"overriding_parameters": {
"jar_params": ["param1", "param2"]
},
"start_time": 1457570074236,
"setup_duration": 259754,
"execution_duration": 3589020,
"cleanup_duration": 31038,
"trigger": "PERIODIC"
},
"notebook_output": {
"result": "the maybe truncated string passed to dbutils.notebook.exit()"
}
}
Request Structure
Retrieves both the output and the metadata of a run.
Field Name | Type | Description |
---|---|---|
run_id | INT64 |
The canonical identifier for the run. This field is required. |
Response Structure
Field Name | Type | Description |
---|---|---|
notebook_output OR error |
NotebookOutput OR STRING |
If notebook_output , the output of a notebook task, if available. A notebook task that terminates (either successfully or with a failure) without callingdbutils.notebook.exit() is considered to have an empty output. This field will be set but its result value will be empty.If error , an error message indicating why output is not available. The message is unstructured, and its exact format is subject to change. |
metadata | Run | All details of the run except for its output. |
Runs Delete
Endpoint | HTTP Method |
---|---|
2.0/jobs/runs/delete |
POST |
Delete a non-active run. Returns an error if the run is active.
An example request:
{
"run_id": 42
}
Request Structure
Retrieve the metadata of a run without any output.
Field Name | Type | Description |
---|---|---|
run_id | INT64 |
The canonical identifier of the run for which to retrieve the metadata. |
Data Structures
ClusterInstance
Identifiers for the cluster and Spark context used by a run. These two values together identify an execution context across all time.
Field Name | Type | Description |
---|---|---|
cluster_id | STRING |
The canonical identifier for the cluster used by a run. This field is always available for runs on existing clusters. For runs on new clusters, it becomes available once the cluster is created. This value can be used to view logs by browsing to /#setting/sparkui/$cluster_id/driver-logs . The logs will continue to be available after the run completes.If this identifier is not yet available, the response won’t include this field. |
spark_context_id | STRING |
The canonical identifier for the Spark context used by a run. This field will be filled in once the run begins execution. This value can be used to view the Spark UI by browsing to /#setting/sparkui/$cluster_id/$spark_context_id . The Spark UI will continue to be available after the run has completed.If this identifier is not yet available, the response won’t include this field. |
ClusterSpec
Important
- When you run a job on a new automated cluster, the job is treated as a data engineering (automated) workload subject to automated workload pricing.
- When you run a job on an existing interactive cluster, it is treated as a data analytics (interactive) workload subject to interactive workload pricing.
Field Name | Type | Description |
---|---|---|
existing_cluster_id ORnew_cluster |
STRING OR NewCluster |
If existing_cluster_id , the ID of an existing cluster that will be used for all runs of this job. When running jobs on an existing cluster, you may need to manually restart the cluster if it stops responding. We suggest running jobs on new clusters for greater reliability.If new_cluster , a description of a cluster that will be created for each run. |
libraries | An array of Library | An optional list of libraries to be installed on the cluster that will execute the job. The default value is an empty list. |
CronSchedule
Field Name | Type | Description |
---|---|---|
quartz_cron_expression | STRING |
A cron expression using quartz syntax that describes the schedule for a job. See Quartz for details. This field is required. |
timezone_id | STRING |
A Java timezone ID. The schedule for a job will be resolved with respect to this timezone. See Java TimeZone for details. This field is required. |
Job
Field Name | Type | Description |
---|---|---|
job_id | INT64 |
The canonical identifier for this job. |
creator_user_name | STRING |
The creator user name. This field won’t be included in the response if the user has already been deleted. |
settings | JobSettings | Settings for this job and all of its runs. These settings can be updated using the resetJob method. |
created_time | INT64 |
The time at which this job was created in epoch milliseconds (milliseconds since 1/1/1970 UTC). |
JobEmailNotifications
Field Name | Type | Description |
---|---|---|
on_start | An array of STRING |
A list of email addresses to be notified when a run begins. If not specified upon job creation or reset, the list will be empty, i.e., no address will be notified. |
on_success | An array of STRING |
A list of email addresses to be notified when a run successfully completes. A run is considered to have completed successfully if it ends with a TERMINATED life_cycle_state and a SUCCESSFUL result_state . If not specified upon job creation or reset, the list will be empty, i.e., no address will be notified. |
on_failure | An array of STRING |
A list of email addresses to be notified when a run unsuccessfully completes. A run is considered to have completed unsuccessfully if it ends with an INTERNAL_ERROR life_cycle_state or a SKIPPED , FAILED , or TIMED_OUT result_state . If not specified upon job creation or reset, the list will be empty, i.e., no address will be notified. |
no_alert_for_skipped_runs | BOOL |
If true, do not send email to recipients specified in on_failure if the run is skipped. |
JobSettings
Important
- When you run a job on a new automated cluster, the job is treated as a data engineering (automated) workload subject to automated workload pricing.
- When you run a job on an existing interactive cluster, it is treated as a data analytics (interactive) workload subject to interactive workload pricing.
Settings for a job. These settings can be updated using the resetJob
method.
Field Name | Type | Description |
---|---|---|
existing_cluster_id OR new_cluster |
STRING OR NewCluster |
If existing_cluster_id , the ID of an existing cluster that will be used for all runs of this job. When running jobs on an existing cluster, you may need to manually restart the cluster if it stops responding. We suggest running jobs on new clusters for greater reliability.If new_cluster , a description of a cluster that will be created for each run. |
notebook_task OR spark_jar_task ORspark_python_task OR spark_submit_task |
NotebookTask OR SparkJarTask OR SparkPythonTask OR SparkSubmitTask | If notebook_task , indicates that this job should run a notebook. This field may not be specified in conjunction with spark_jar_task .If spark_jar_task , indicates that this job should run a JAR.If spark_python_task , indicates that this job should run a Python file.If spark_submit_task , indicates that this job should run Spark submit script. |
name | STRING |
An optional name for the job. The default value is Untitled . |
libraries | An array of Library | An optional list of libraries to be installed on the cluster that will execute the job. The default value is an empty list. |
email_notifications | JobEmailNotifications | An optional set of email addresses that will be notified when runs of this job begin or complete as well as when this job is deleted. The default behavior is to not send any emails. |
timeout_seconds | INT32 |
An optional timeout applied to each run of this job. The default behavior is to have no timeout. |
max_retries | INT32 |
An optional maximum number of times to retry an unsuccessful run. A run is considered to be unsuccessful if it completes with a FAILED result_state orINTERNAL_ERROR life_cycle_state . The value -1 means to retry indefinitely and the value 0 means to never retry. The default behavior is to never retry. |
min_retry_interval_millis | INT32 |
An optional minimal interval in milliseconds between attempts. The default behavior is that unsuccessful runs are immediately retried. |
retry_on_timeout | BOOL |
An optional policy to specify whether to retry a job when it times out. The default behavior is to not retry on timeout. |
schedule | CronSchedule | An optional periodic schedule for this job. The default behavior is that the job will only run when triggered by clicking “Run Now” in the Jobs UI or sending an API request torunNow . |
max_concurrent_runs | INT32 |
An optional maximum allowed number of concurrent runs of the job. Set this value if you want to be able to execute multiple runs of the same job concurrently. This is useful for example if you trigger your job on a frequent schedule and want to allow consecutive runs to overlap with each other, or if you want to trigger multiple runs which differ by their input parameters. This setting affects only new runs. For example, suppose the job’s concurrency is 4 and there are 4 concurrent active runs. Then setting the concurrency to 3 won’t kill any of the active runs. However, from then on, new runs will be skipped unless there are fewer than 3 active runs. This value cannot exceed 150. Setting this value to 0 will cause all new runs to be skipped. The default behavior is to allow only 1 concurrent run. |
JobTask
Field Name | Type | Description |
---|---|---|
notebook_task OR spark_jar_task ORspark_python_task OR spark_submit_task |
NotebookTask OR SparkJarTask OR SparkPythonTask OR SparkSubmitTask | If notebook_task , indicates that this job should run a notebook. This field may not be specified in conjunction with spark_jar_task .If spark_jar_task , indicates that this job should run a JAR.If spark_python_task , indicates that this job should run a Python file.If spark_submit_task , indicates that this job should run spark submit script. |
NewCluster
Field Name | Type | Description |
---|---|---|
num_workers OR autoscale |
INT32 OR AutoScale |
If num_workers , number of worker nodes that this cluster should have. A cluster has one Spark Driver and num_workers Executors for a total of num_workers + 1 Spark nodes.Note: When reading the properties of a cluster, this field reflects the desired number of workers rather than the actual current number of workers. For instance, if a cluster is resized from 5 to 10 workers, this field will immediately be updated to reflect the target size of 10 workers, whereas the workers listed in spark_info will gradually increase from 5 to 10 as the new nodes are provisioned.If autoscale , parameters needed in order to automatically scale clusters up and down based on load. Note: autoscaling works best with Databricks Runtime 3.0 or above. |
cluster_name | STRING |
Cluster name. This doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string. |
spark_version | STRING |
The Spark version of the cluster. A list of available Spark versions can be retrieved by using the Runtime Versions API call. This field is required. |
spark_conf | SparkConfPair | An object containing a set of optional, user-specified Spark configuration key-value pairs. You can also pass in a string of extra JVM options to the driver and the executors viaspark.driver.extraJavaOptions and spark.executor.extraJavaOptions respectively.Example Spark confs: {"spark.speculation": true, "spark.streaming.ui.retainedBatches": 5} or{"spark.driver.extraJavaOptions": "-verbose:gc -XX:+PrintGCDetails"} |
node_type_id | STRING |
This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads A list of available node types can be retrieved by using the List Node Types API call. This field is required. |
driver_node_type_id | STRING |
The node type of the Spark driver. This field is optional; if unset, the driver node type is set as the same value as node_type_id defined above. |
custom_tags | An array of ClusterTag | Additional tags for cluster resources. Databricks will tag all cluster resources with these tags in addition to default_tags . Notes:* Tags are not supported on legacy node types such as compute-optimized and memory-optimized * Databricks allows at most 45 custom tags |
cluster_log_conf | ClusterLogConf | The configuration for delivering Spark logs to a long-term storage destination. Only one destination can be specified for one cluster. If the conf is given, the logs will be delivered to the destination every 5 mins . The destination of driver logs is <destination>/<cluster-id>/driver , while the destination of executor logs is <destination>/<cluster-id>/executor . |
init_scripts | An array of InitScriptInfo | The configuration for storing init scripts. Any number of scripts can be specified. The scripts are executed sequentially in the order provided. If cluster_log_conf is specified, init script logs are sent to<destination>/<cluster-id>/init_scripts . |
spark_env_vars | SparkEnvPair | An object containing a set of optional, user-specified environment variable key-value pairs. Key-value pair of the form (X,Y) are exported as is (i.e.,export X='Y' ) while launching the driver and workers.In order to specify an additional set of SPARK_DAEMON_JAVA_OPTS , we recommend appending them to $SPARK_DAEMON_JAVA_OPTS as shown in the example below. This ensures that all default databricks managed environmental variables are included as well.Example Spark environment variables: {"SPARK_WORKER_MEMORY": "28000m", "SPARK_LOCAL_DIRS": "/local_disk0"} or{"SPARK_DAEMON_JAVA_OPTS": "$SPARK_DAEMON_JAVA_OPTS -Dspark.shuffle.service.enabled=true"} |
enable_elastic_disk | BOOL |
Autoscaling Local Storage: when enabled, this cluster dynamically acquires additional disk space when its Spark workers are running low on disk space. Refer to _ for details. |
instance_pool_id | STRING |
The optional ID of the instance pool to which the cluster belongs. Refer to Instance Pools API for details. |
NotebookOutput
Field Name | Type | Description |
---|---|---|
result | STRING |
The value passed to dbutils.notebook.exit(). Azure Databricks restricts this API to return the first 1 MB of the value. For a larger result, your job can store the results in a cloud storage service. This field will be absent if dbutils.notebook.exit() was never called. |
truncated | BOOLEAN |
Whether or not the result was truncated. |
NotebookTask
All the output cells are subject to the size of 8MB. If the output of a cell has a larger size, the rest of the run will be cancelled and the run will be marked as failed. In that case, some of the content output from other cells may also be missing. If you need help finding the cell that is beyond the limit, run the notebook against an interactive cluster and use this notebook autosave technique.
Field Name | Type | Description |
---|---|---|
notebook_path | STRING |
The absolute path of the notebook to be run in the Azure Databricks Workspace. This path must begin with a slash. This field is required. |
base_parameters | A map of ParamPair | Base parameters to be used for each run of this job. If the run is initiated by a call to run-now with parameters specified, the two parameters maps will be merged. If the same key is specified in base_parameters and in run-now , the value from run-now will be used.If the notebook takes a parameter that is not specified in the job’s base_parameters or the run-now override parameters, the default value from the notebook will be used.These parameters can be retrieved in a notebook by using dbutils.widgets.get() . |
ParamPair
Name-based parameters for jobs running notebook tasks.
Field Name | Type | Description |
---|---|---|
key | STRING |
Named parameter, can be passed to dbutils.widgets.get() to retrieve the corresponding value. |
value | STRING |
Value of named parameter, returned by calls to dbutils.widgets.get() in notebooks. |
Run
All the information about a run except for its output. The output can be retrieved separately
with the getRunOutput
method.
Field Name | Type | Description |
---|---|---|
job_id | INT64 |
The canonical identifier of the job that contains this run. |
run_id | INT64 |
The canonical identifier of the run. This ID is unique across all runs of all jobs. |
creator_user_name | STRING |
The creator user name. This field won’t be included in the response if the user has already been deleted. |
number_in_job | INT64 |
The sequence number of this run among all runs of the job. This value starts at 1. |
original_attempt_run_id | INT64 |
If this run is a retry of a prior run attempt, this field contains the run_id of the original attempt; otherwise, it is the same as the run_id . |
state | RunState | The result and lifecycle states of the run. |
schedule | CronSchedule | The cron schedule that triggered this run if it was triggered by the periodic scheduler. |
task | JobTask | The task performed by the run, if any. |
cluster_spec | ClusterSpec | A snapshot of the job’s cluster specification when this run was created. |
cluster_instance | ClusterInstance | The cluster used for this run. If the run is specified to use a new cluster, this field will be set once the Jobs service has requested a cluster for the run. |
overriding_parameters | RunParameters | The parameters used for this run. |
start_time | INT64 |
The time at which this run was started in epoch milliseconds (milliseconds since 1/1/1970 UTC). This may not be the time when the job task starts executing, for example, if the job is scheduled to run on a new cluster, this is the time the cluster creation call is issued. |
setup_duration | INT64 |
The time it took to set up the cluster in milliseconds. For runs that run on new clusters this is the cluster creation time, for runs that run on existing clusters this time should be very short. |
execution_duration | INT64 |
The time in milliseconds it took to execute the commands in the JAR or notebook until they completed, failed, timed out, were cancelled, or encountered an unexpected error. |
cleanup_duration | INT64 |
The time in milliseconds it took to terminate the cluster and clean up any intermediary results, etc. The total duration of the run is the sum of the setup_duration , theexecution_duration , and the cleanup_duration . |
trigger | TriggerType | The type of trigger that fired this run, e.g., a periodic schedule or a one time run. |
RunParameters
Parameters for this run. Only one of jar_params
, python_params
, or notebook_params
should be specified in the run-now
request, depending on the type of job task.
Jobs with Spark JAR task or Python task take a list of position-based parameters, and jobs
with notebook tasks take a key value map.
Field Name | Type | Description |
---|---|---|
jar_params | An array of STRING |
A list of parameters for jobs with Spark JAR tasks, e.g. "jar_params": ["john doe", "35"] . The parameters will be used to invoke the main function of the main class specified in the Spark JAR task. If not specified upon run-now , it will default to an empty list.jar_params cannot be specified in conjunction with notebook_params . The JSON representation of this field (i.e. {"jar_params":["john doe","35"]} ) cannot exceed 10,000 bytes. |
notebook_params | A map of ParamPair | A map from keys to values for jobs with notebook task, e.g."notebook_params": {"name": "john doe", "age": "35"} . The map is passed to the notebook and will be accessible through the dbutils.widgets.get function. See Widgets for more information.If not specified upon run-now , the triggered run uses the job’s base parameters.notebook_params cannot be specified in conjunction with jar_params .The JSON representation of this field (i.e. {"notebook_params":{"name":"john doe","age":"35"}} ) cannot exceed 10,000 bytes. |
python_params | An array of STRING |
A list of parameters for jobs with Python tasks, e.g. "python_params": ["john doe", "35"] . The parameters will be passed to Python file as command-line parameters. If specified upon run-now , it would overwrite the parameters specified in job setting. The JSON representation of this field (i.e. {"python_params":["john doe","35"]} ) cannot exceed 10,000 bytes. |
spark_submit_params | An array of STRING |
A list of parameters for jobs with spark submit task, e.g."spark_submit_params": ["--class", "org.apache.spark.examples.SparkPi"] . The parameters will be passed to spark-submit script as command-line parameters. If specified upon run-now , it would overwrite the parameters specified in job setting. The JSON representation of this field cannot exceed 10,000 bytes. |
RunState
Field Name | Type | Description |
---|---|---|
life_cycle_state | RunLifeCycleState | A description of a run’s current location in the run lifecycle. This field is always available in the response. |
result_state | RunResultState | The result state of a run. If it is not available, the response won’t include this field. See RunResultState for details about the availability of result_state . |
state_message | STRING |
A descriptive message for the current state. This field is unstructured, and its exact format is subject to change. |
SparkJarTask
Field Name | Type | Description |
---|---|---|
jar_uri | STRING |
Deprecated since 04/2016. Provide a jar through the libraries field instead. For an example, see Create. |
main_class_name | STRING |
The full name of the class containing the main method to be executed. This class must be contained in a JAR provided as a library. The code should use SparkContext.getOrCreate to obtain a Spark context; otherwise, runs of the job will fail. |
parameters | An array of STRING |
Parameters that will be passed to the main method. |
SparkPythonTask
Field Name | Type | Description |
---|---|---|
python_file | STRING |
The URI of the Python file to be executed. DBFS paths are supported. This field is required. |
parameters | An array of STRING |
Command line parameters that will be passed to the Python file. |
SparkSubmitTask
Important
- You can Spark submit tasks only on new clusters.
- In the
new_cluster
specification,libraries
andspark_conf
are not supported. Instead, use--jars
and--py-files
to add Java and Python libraries and use--conf
to set the Spark configuration. master
,deploy-mode
, andexecutor-cores
are automatically configured by Azure Databricks; you cannot specify them in parameters.- By default, the Spark submit job uses all available memory (excluding reserved memory for
Azure Databricks services). You can set
--driver-memory
, and--executor-memory
to a smaller value to leave some room for off-heap usage. - The
--jars
,--py-files
,--files
arguments support DBFS paths.
For example, assuming the JAR is uploaded to DBFS, you can run SparkPi
by setting the following parameters.
{
"parameters": [
"--class",
"org.apache.spark.examples.SparkPi",
"dbfs:/path/to/examples.jar",
"10"
]
}
Field Name | Type | Description |
---|---|---|
parameters | An array of STRING |
Command-line parameters passed to spark submit. |
ViewItem
The exported content is in HTML format. For example, if the view to export is dashboards, one HTML string is returned for every dashboard.
Field Name | Type | Description |
---|---|---|
content | STRING |
Content of the view. |
name | STRING |
Name of the view item. In the case of code view, it would be the notebook’s name. In the case of dashboard view, it would be the dashboard’s name. |
type | ViewType | Type of the view item (e.g., NOTEBOOK, DASHBOARD). |
RunLifeCycleState
The life cycle state of a run. Allowed state transitions are:
PENDING
->RUNNING
->TERMINATING
->TERMINATED
PENDING
->SKIPPED
PENDING
->INTERNAL_ERROR
RUNNING
->INTERNAL_ERROR
TERMINATING
->INTERNAL_ERROR
State | Description |
---|---|
PENDING |
The run has been triggered. If there is not already an active run of the same job, the cluster and execution context are being prepared. If there is already an active run of the same job, the run will immediately transition into a SKIPPED state without preparing any resources. |
RUNNING |
The task of this run is being executed. |
TERMINATING |
The task of this run has completed, and the cluster and execution context are being cleaned up. |
TERMINATED |
The task of this run has completed, and the cluster and execution context have been cleaned up. This state is terminal. |
SKIPPED |
This run was aborted because a previous run of the same job was already active. This state is terminal. |
INTERNAL_ERROR |
An exceptional state that indicates a failure in the Jobs service, such as network failure over a long period. If a run on a new cluster ends in an INTERNAL_ERROR state, the Jobs service terminates the cluster as soon as possible. This state is terminal. |
RunResultState
The result state of the run.
- If
life_cycle_state
=TERMINATED
: if the run had a task, the result is guaranteed to be available, and it indicates the result of the task. - If
life_cycle_state
=PENDING
,RUNNING
, orSKIPPED
, the result state is not available. - If
life_cycle_state
=TERMINATING
or lifecyclestate =INTERNAL_ERROR
: the result state is available if the run had a task and managed to start it.
Once available, the result state never changes.
State | Description |
---|---|
SUCCESS | The task completed successfully. |
FAILED | The task completed with an error. |
TIMEDOUT | The run was stopped after reaching the timeout. |
CANCELED | The run was canceled at user request. |
TriggerType
These are the type of triggers that can fire a run.
Type | Description |
---|---|
PERIODIC | These are schedules that periodically trigger runs, such as a cron scheduler. |
ONE_TIME | These are one time triggers that only fire a single run. This means the user triggered a single run on demand through the UI or the API. |
RETRY | This indicates a run that is triggered as a retry of a previously failed run. This occurs when the user requests to re-run the job in case of failures. |
ViewType
Type | Description |
---|---|
NOTEBOOK | Notebook view item |
DASHBOARD | Dashboard view item |
ViewsToExport
View to export: either code, all dashboards, or all.
Type | Description |
---|---|
CODE | Code view of the notebook |
DASHBOARDS | All dashboard views of the notebook |
ALL | All views of the notebook |
Feedback
Loading feedback...