REST API 1.2 REST API 1.2

如果不使用 Web UI,可以通过 Azure Databricks REST API 以编程方式访问 Azure Databricks。The Azure Databricks REST API allows you to programmatically access Azure Databricks instead of going through the web UI.

本文介绍 REST API 1.2。This article covers REST API 1.2. 对于大多数用例,我们建议使用 REST API 2.0For most use cases, we recommend using the REST API 2.0. 此版本支持 1.2 API 的大多数功能,并支持其他功能。It supports most of the functionality of the 1.2 API, as well as additional functionality.

重要

要访问 Databricks REST API,必须进行身份验证To access Databricks REST APIs, you must authenticate.

REST API 用例REST API use cases

  • 启动从现有生产系统或工作流系统触发的 Apache Spark 作业。Start Apache Spark jobs triggered from your existing production systems or from workflow systems.
  • 以编程方式在一天的固定时间启动某种大小的群集,并在夜间将其关闭。Programmatically bring up a cluster of a certain size at a fixed time of day and then shut it down at night.

API 类别API categories

  • 执行上下文:创建可在其中调用 Spark 命令的唯一变量命名空间。Execution context: create unique variable namespaces where Spark commands can be called.
  • 命令执行:在特定的执行上下文中运行命令。Command execution: run commands within a specific execution context.

详细信息Details

  • 此 REST API 通过 HTTPS 运行。This REST API runs over HTTPS.
  • 若要检索信息,请使用 HTTP GET。For retrieving information, use HTTP GET.
  • 若要修改状态,请使用 HTTP POST。For modifying state, use HTTP POST.
  • 若要上传文件,请使用 multipart/form-dataFor file upload, use multipart/form-data. 否则使用 application/jsonOtherwise use application/json.
  • 响应内容类型为 JSON。The response content type is JSON.
  • 使用基本身份验证对每个 API 调用用户进行身份验证。Basic authentication is used to authenticate the user for every API call.
  • 用户凭据经过 base64 编码,位于每个 API 调用的 HTTP 标头中。User credentials are base64 encoded and are in the HTTP header for every API call. 例如,Authorization: Basic YWRtaW46YWRtaW4=For example, Authorization: Basic YWRtaW46YWRtaW4=.

入门Get started

在以下示例中,请将 <databricks-instance> 替换为 Azure Databricks 部署的工作区 URLIn the following examples, replace <databricks-instance> with the workspace URL of your Azure Databricks deployment.

测试连接Test your connection

> telnet <databricks-instance> 443

Trying 52.11.163.202...
Connected to <databricks-instance>.
Escape character is '^]'.

> nc -v -z <databricks-instance> 443
found 1 connections:
     1: flags=82<CONNECTED,PREFERRED>
    outif utun0
    src x.x.x.x port 59063
    dst y.y.y.y port 443
    rank info not available
    TCP aux info available

Connection to <databricks-instance> port 443 [TCP/HTTPS] succeeded!

可以使用上述任一工具来测试连接。You can use either tool above to test the connection. 端口 443 是默认的 HTTPS 端口,可在此端口上运行 REST API。Port 443 is default HTTPS port and you can run the REST API on this port. 如果无法连接到端口 443,请使用帐户 URL 联系 help@databricks.comIf you cannot connect to port 443, contact help@databricks.com with your account URL.

示例 API 调用Sample API calls

以下示例提供了一些 cURL 命令,但你也可以使用以所选编程语言编写的 HTTP 库。The following examples provide some cURL commands, but you can also use an HTTP library in your programming language of choice.

GET 请求GET request

备注

如果 URL 中包含 & 字符,则必须将该 URL 括在引号中,以免 UNIX 将其解释为命令分隔符:If your URL has the & character in it you must quote that URL so UNIX doesn’t interpret it as a command separator:

curl -n 'https://<databricks-instance>/api/1.2/commands/status?clusterId=batVenom&contextId=35585555555555&commandId=45382422555555555'

使用 application/json 的 POST 请求POST request with application/json

curl -X POST -n https://<databricks-instance>/api/1.2/contexts/create -d "language=scala&clusterId=batVenom"

按类别列出的 API 终结点API endpoints by category

执行上下文 Execution context

  • https://<databricks-instance>/api/1.2/contexts/create – 按给定的编程语言在指定的群集上创建执行上下文https://<databricks-instance>/api/1.2/contexts/create – create an execution context on a specified cluster for a given programming language

    • 使用 application/json 的 POST 请求:POST request with application/json:
      • datadata

        {"language": "scala", "clusterId": "peaceJam"}
        
  • https://<databricks-instance>/api/1.2/contexts/status – 显示现有执行上下文的状态https://<databricks-instance>/api/1.2/contexts/status – show the status of an existing execution context

    • GET 请求:GET request:
      • 示例参数:clusterId=peaceJam&contextId=179365396413324Example arguments: clusterId=peaceJam&contextId=179365396413324
      • status: ["Pending", "Running", "Error"]status: ["Pending", "Running", "Error"]
  • https://<databricks-instance>/api/1.2/contexts/destroy – 销毁执行上下文https://<databricks-instance>/api/1.2/contexts/destroy – destroy an execution context

    • 使用 application/json 的 POST 请求:POST request with application/json:
      • datadata

        {"contextId" : "1793653964133248955", "clusterId" : "peaceJam"}
        

命令执行Command execution

已知限制:命令执行不支持 %runKnown limitations: command execution does not support %run.

  • https://<databricks-instance>/api/1.2/commands/execute – 运行命令或文件。https://<databricks-instance>/api/1.2/commands/execute – run a command or file.

    • 使用 application/json 的 POST 请求:POST request with application/json:

      • datadata

        {"language": "scala", "clusterId": "peaceJam", "contextId" : "5456852751451433082", "command": "sc.parallelize(1 to 10).collect"}
        
    • 使用 multipart/form-data 的 POST 请求:POST request with multipart/form-data:

      • datadata

        {"language": "python", "clusterId": "peaceJam", "contextId" : "5456852751451433082"}
        
      • filesfiles

        {"command": "./myfile.py"}
        
  • https://<databricks-instance>/api/1.2/commands/status – 显示一个命令的状态或结果https://<databricks-instance>/api/1.2/commands/status – show one command’s status or result

    • GET 请求GET Request
      • 示例参数:clusterId=peaceJam&contextId=5456852751451433082&commandId=5220029674192230006Example arguments: clusterId=peaceJam&contextId=5456852751451433082&commandId=5220029674192230006
      • status:["Queued", "Running", "Cancelling", "Finished", "Cancelled", "Error"]status:["Queued", "Running", "Cancelling", "Finished", "Cancelled", "Error"]
  • https://<databricks-instance>/api/1.2/commands/cancel – 取消一个命令https://<databricks-instance>/api/1.2/commands/cancel – cancel one command

    • 使用 application/json 的 POST 请求:POST request with application/json:
      • datadata

        {"clusterId": "peaceJam", "contextId" : "5456852751451433082", "commandId" : "2245426871786618466"}
        

示例:上传和运行 Spark JARExample: Upload and run a Spark JAR

上传 JARUpload a JAR

  1. 使用 REST API 2.0 上传 JAR 并将其附加到群集。Use the REST API 2.0 to upload a JAR and attach it to a cluster.

运行 JARRun a JAR

  1. 创建执行上下文。Create an execution context.

    curl -X POST -n  https://<databricks-instance>/api/1.2/contexts/create -d "language=scala&clusterId=batVenom"
    
    {
      "id": "3558513128163162828"
    }
    
  2. 执行使用 JAR 的命令。Execute a command that uses your JAR.

    curl -X POST -n https://<databricks-instance>/api/1.2/commands/execute \
    -d 'language=scala&clusterId=batVenom&contextId=3558513128163162828&command=println(com.databricks.apps.logs.chapter1.LogAnalyzer.processLogFile(sc,null,"dbfs:/somefile.log"))'
    
    {
      "id": "4538242203822083978"
    }
    
  3. 检查命令的状态。Check on the status of your command. 如果运行冗长的 Spark 作业,该命令可能不会立即返回。It may not return immediately if you are running a lengthy Spark job.

    curl -n 'https://<databricks-instance>/api/1.2/commands/status?clusterId=batVenom&contextId=3558513128163162828&commandId=4538242203822083978'
    
    {
       "id": "4538242203822083978",
       "results": {
         "data": "Content Size Avg: 1234, Min: 1234, Max: 1234",
         "resultType": "text"
       },
       "status": "Finished"
    }