您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

用于 Java 的 HDInsight SDKHDInsight SDK for Java

概述Overview

用于 Java 的 HDInsight SDK 提供用于管理 HDInsight 群集的类和方法。The HDInsight SDK for Java provides classes and methods that allow you to manage your HDInsight clusters. 该 SDK 包含用于创建、删除、更新、列出、调整大小、执行脚本操作,以及监视、获取 HDInsight 群集属性等操作。It includes operations to create, delete, update, list, resize, execute script actions, monitor, get properties of HDInsight clusters, and more.

先决条件Prerequisites

SDK 安装SDK Installation

若要通过 Maven 获取用于 Java 的 HDInsight SDK,请单击此处The HDInsight SDK for Java is available through Maven here. 将以下依赖项添加到 pom.xml:Add the following dependency to your pom.xml:

<dependency>
    <groupId>com.microsoft.azure.hdinsight.v2018_06_01_preview</groupId>
    <artifactId>azure-mgmt-hdinsight</artifactId>
    <version>1.0.0-beta-1</version>
</dependency>

此外还需将以下依赖项添加到 pom.xml:You will also need to add the following dependencies to your pom.xml:

AuthenticationAuthentication

首先需要使用 Azure 订阅对该 SDK 进行身份验证。The SDK first needs to be authenticated with your Azure subscription. 请遵循以下示例创建服务主体,然后使用该服务主体进行身份验证。Follow the example below to create a service principal and use it to authenticate. 完成此操作后,将会获得 HDInsightManagementClientImpl 的实例,其中包含可用于执行管理操作的多个方法(以下部分将概述这些方法)。After this is done, you will have an instance of an HDInsightManagementClientImpl, which contains many methods (outlined in below sections) that can be used to perform management operations.

备注

除了以下示例中所示的方法以外,还有其他一些身份验证方法可能更符合你的需要。There are other ways to authenticate besides the below example that could potentially be better suited for your needs. 此处概述了所有方法:使用用于 Java 的 Azure 管理库进行身份验证All methods are outlined here: Authenticate with the Azure management libraries for Java

使用服务主体的身份验证示例Authentication Example Using a Service Principal

首先登录到 Azure Cloud ShellFirst, login to Azure Cloud Shell. 验证当前使用的是要在其中创建服务主体的订阅。Verify you are currently using the subscription in which you want the service principal created.

az account show

订阅信息将显示为 JSON。Your subscription information is displayed as JSON.

{
  "environmentName": "AzureCloud",
  "id": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
  "isDefault": true,
  "name": "XXXXXXX",
  "state": "Enabled",
  "tenantId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
  "user": {
    "cloudShellID": true,
    "name": "XXX@XXX.XXX",
    "type": "user"
  }
}

如果尚未登录到正确的订阅,请运行以下命令选择正确的订阅:If you're not logged into the correct subscription, select the correct one by running:

az account set -s <name or ID of subscription>

重要

如果尚未通过其他方法(例如,通过 Azure 门户创建 HDInsight 群集)注册 HDInsight 资源提供程序,则需要先执行此操作一次,然后才能进行身份验证。If you have not already registered the HDInsight Resource Provider by another method (such as by creating an HDInsight Cluster through the Azure Portal), you need to do this once before you can authenticate. 可以在 Azure Cloud Shell 中运行以下命令来完成此操作:This can be done from the Azure Cloud Shell by running the following command:

az provider register --namespace Microsoft.HDInsight

接下来,选择服务主体的名称,然后使用以下命令创建服务主体:Next, choose a name for your service principal and create it with the following command:

az ad sp create-for-rbac --name <Service Principal Name> --sdk-auth

服务主体信息将以 JSON 格式显示。The service principal information is displayed as JSON.

{
  "clientId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
  "clientSecret": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
  "subscriptionId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
  "tenantId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
  "activeDirectoryEndpointUrl": "https://login.microsoftonline.com",
  "resourceManagerEndpointUrl": "https://management.azure.com/",
  "activeDirectoryGraphResourceId": "https://graph.windows.net/",
  "sqlManagementEndpointUrl": "https://management.core.windows.net:8443/",
  "galleryEndpointUrl": "https://gallery.azure.com/",
  "managementEndpointUrl": "https://management.core.windows.net/"
}

复制以下代码片段,并在 TENANT_IDCLIENT_IDCLIENT_SECRETSUBSCRIPTION_ID 中填写运行创建服务主体的命令后返回的 JSON 中的字符串。Copy the below snippet and fill in TENANT_ID, CLIENT_ID, CLIENT_SECRET, and SUBSCRIPTION_ID with the strings from the JSON that was returned after running the command to create the service principal.

import com.microsoft.azure.management.hdinsight.v2018_06_01_preview.*;
import com.microsoft.azure.management.hdinsight.v2018_06_01_preview.implementation.HDInsightManagementClientImpl;

public class Main {
    public static void main (String[] args) {

        // Tenant ID for your Azure Subscription
        String TENANT_ID = "";
        // Your Service Principal App Client ID
        String CLIENT_ID = "";
        // Your Service Principal Client Secret
        String CLIENT_SECRET = "";
        // Azure Subscription ID
        String SUBSCRIPTION_ID = "";

        ApplicationTokenCredentials credentials = new ApplicationTokenCredentials(
                CLIENT_ID,
                TENANT_ID,
                CLIENT_SECRET,
                AzureEnvironment.AZURE);

        HDInsightManagementClientImpl client = new HDInsightManagementClientImpl(credentials)
                .withSubscriptionId(SUBSCRIPTION_ID);

群集管理Cluster Management

备注

本部分假设你已完成身份验证,已构造 HDInsightManagementClientImpl 实例并已将其存储在名为 client 的变量中。This section assumes you have already authenticated and constructed an HDInsightManagementClientImpl instance and store it in a variable called client. 在前面的“身份验证”部分可以找到有关身份验证和获取 HDInsightManagementClientImpl 的说明。Instructions for authenticating and obtaining an HDInsightManagementClientImpl can be found in the Authentication section above.

创建群集Create a Cluster

可以通过调用 client.clusters().create() 来创建新群集。A new cluster can be created by calling client.clusters().create().

示例Samples

用于创建几个常见类型的 HDInsight 群集的代码示例可供使用:HDInsight Java 示例Code samples for creating several common types of HDInsight clusters are available: HDInsight Java Samples.

示例Example

本示例演示如何创建包含 2 个头节点和 1 个工作节点的 Spark 群集。This example demonstrates how to create a Spark cluster with 2 head nodes and 1 worker node.

备注

首先需要创建一个资源组和存储帐户,下面将予以介绍。You first need to create a Resource Group and Storage Account, as explained below. 如果已创建资源组和存储帐户,则可以跳过这些步骤。If you have already created these, you can skip these steps.

创建资源组Creating a Resource Group

可以在 Azure Cloud Shell 中运行以下命令来创建资源组You can create a resource group using the Azure Cloud Shell by running

az group create -l <Region Name (i.e. eastus)> --n <Resource Group Name>
创建存储帐户Creating a Storage Account

可以在 Azure Cloud Shell 中运行以下命令来创建存储帐户You can create a storage account using the Azure Cloud Shell by running:

az storage account create -n <Storage Account Name> -g <Existing Resource Group Name> -l <Region Name (i.e. eastus)> --sku <SKU i.e. Standard_LRS>

现在,运行以下命令获取存储帐户的密钥(创建群集时需要用到):Now run the following command to get the key for your storage account (you will need this to create a cluster):

az storage account keys list -n <Storage Account Name>

以下 Java 代码片段创建包含 2 个头节点和 1 个工作节点的 Spark 群集。The below Java snippet creates a Spark cluster with 2 head nodes and 1 worker node. 按照注释中所述填写空白变量,并根据具体的需要任意更改其他参数。Fill in the blank variables as explained in the comments and feel free to change other parameters to suit your specific needs.

// The name for the cluster you are creating
String clusterName = "";
// The name of your existing Resource Group
String resourceGroupName = "";
// Choose a username
String username = "";
// Choose a password
String password = "";
// Replace <> with the name of your storage account
String storageAccount = "<>.blob.core.windows.net";
// Storage account key you obtained above
String storageAccountKey = "";
// Choose a region
String location = "";
String container = "default";

HashMap<String, HashMap<String, String>> configurations = new HashMap<String, HashMap<String, String>>();
        HashMap<String, String> gateway = new HashMap<String, String>();
        gateway.put("restAuthCredential.enabled_credential", "True");
        gateway.put("restAuthCredential.username", username);
        gateway.put("restAuthCredential.password", password);
        configurations.put("gateway", gateway);

        ClusterCreateParametersExtended parameters = new ClusterCreateParametersExtended()
            .withLocation(location)
            .withTags(Collections.EMPTY_MAP)
            .withProperties(
                new ClusterCreateProperties()
                    .withClusterVersion("3.6")
                    .withOsType(OSType.LINUX)
                    .withClusterDefinition(new ClusterDefinition()
                            .withKind("spark")
                            .withConfigurations(configurations)
                    )
                    .withTier(Tier.STANDARD)
                    .withComputeProfile(new ComputeProfile()
                        .withRoles(List.of(
                            new Role()
                                .withName("headnode")
                                .withTargetInstanceCount(2)
                                .withHardwareProfile(new HardwareProfile()
                                    .withVmSize("Large")
                                )
                                .withOsProfile(new OsProfile()
                                    .withLinuxOperatingSystemProfile(new LinuxOperatingSystemProfile()
                                            .withUsername(username)
                                            .withPassword(password)
                                    )
                                ),
                            new Role()
                                    .withName("workernode")
                                    .withTargetInstanceCount(1)
                                    .withHardwareProfile(new HardwareProfile()
                                        .withVmSize("Large")
                                    )
                                    .withOsProfile(new OsProfile()
                                        .withLinuxOperatingSystemProfile(new LinuxOperatingSystemProfile()
                                            .withUsername(username)
                                            .withPassword(password)
                                        )
                                    )
                        ))
                    )
                    .withStorageProfile(new StorageProfile()
                        .withStorageaccounts(List.of(
                                new StorageAccount()
                                    .withName(storageAccount)
                                    .withKey(storageAccountKey)
                                    .withContainer(container)
                                    .withIsDefault(true)
                        ))
                    )
            );
        client.clusters().create(resourceGroupName, clusterName, parameters);

获取群集详细信息Get Cluster Details

获取给定群集的属性:To get properties for a given cluster:

client.clusters.getByResourceGroup("<Resource Group Name>", "<Cluster Name>");

示例Example

可以使用 get 来确认已成功创建群集。You can use get to confirm that you have successfully created your cluster.

ClusterInner cluster = client.clusters().getByResourceGroup("<Resource Group Name>", "<Cluster Name>");
System.out.println(cluster.name()); //Prints the name of the cluster
System.out.println(cluster.id()); //Prints the resource Id of the cluster

输出应如下所示:The output should look like:

<Cluster Name>
/subscriptions/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/resourceGroups/<Resource Group Name>/providers/Microsoft.HDInsight/clusters/<Cluster Name>

列出群集List Clusters

列出订阅下的群集List Clusters Under The Subscription

client.clusters.list();

按资源组列出群集List Clusters By Resource Group

client.clusters.listByResourceGroup("<Resource Group Name>");

备注

List()ListByResourceGroup() 都返回 PagedList<ClusterInner> 对象。Both List() and ListByResourceGroup() return a PagedList<ClusterInner> object. 调用 loadNext() 会在该页上返回群集列表,并会将 ClusterPaged 对象推到下一页。Calling loadNext() returns a list of clusters on that page and advances the ClusterPaged object to the next page. 此操作可以一直重复,直至 hasNextPage() 返回 false,这表明没有其他页。This can be repeated until hasNextPage() return false, indicating that there are no more pages.

示例Example

以下示例列显当前订阅的所有群集的属性:The following example prints the properties of all clusters for the current subscription:

PagedList<ClusterInner> clusterPages = client.clusters().list();
while (true) {
    for (ClusterInner cluster : clusterPages.currentPage().items()) {
        System.out.println(cluster.name());
    }
    if (clusterPages.hasNextPage()) {
        clusterPages.loadNextPage();
    } else {
        break;
    }
}

删除群集Delete a Cluster

删除群集:To delete a cluster:

client.clusters.delete("<Resource Group Name>", "<Cluster Name>");

更新群集标记Update Cluster Tags

可按如下所示更新给定群集的标记:You can update the tags of a given cluster like so:

client.clusters.update("<Resource Group Name>", "<Cluster Name>", <Map<String,String> of Tags>);

调整群集大小Resize Cluster

可以通过指定新大小来调整给定群集的工作节点数,如下所示:You can resize a given cluster's number of worker nodes by specifying a new size like so:

client.clusters.resize("<Resource Group Name>", "<Cluster Name>", <Num of Worker Nodes (int)>)

群集监视Cluster Monitoring

使用 HDInsight 管理 SDK 还可以通过 Operations Management Suite (OMS) 来管理群集的监视。The HDInsight Management SDK can also be used to manage monitoring on your clusters via the Operations Management Suite (OMS).

启用 OMS 监视Enable OMS Monitoring

备注

若要启用 OMS 监视,必须已有一个 Log Analytics 工作区。To enable OMS Monitoring, you must have an existing Log Analytics workspace. 如果尚未创建工作区,可在此了解创建方法:在 Azure 门户中创建 Log Analytics 工作区If you have not already created one, you can learn how to do that here: Create a Log Analytics workspace in the Azure portal.

在群集上启用 OMS 监视:To enable OMS Monitoring on your cluster:

client.extensions().enableMonitoring("<Resource Group Name", "<Cluster Name>", new ClusterMonitoringRequest().withWorkspaceId("<Workspace Id>"));

查看 OMS 监视状态View Status Of OMS Monitoring

获取群集上的 OMS 状态:To get the status of OMS on your cluster:

client.extensions().getMonitoringStatus("<Resource Group Name", "Cluster Name");

禁用 OMS 监视Disable OMS Monitoring

在群集上禁用 OMS:To disable OMS on your cluster:

client.extensions().disableMonitoring("<Resource Group Name>", "<Cluster Name>");

脚本操作Script Actions

HDInsight 提供一个称为“脚本操作”的配置方法,该方法可调用用于自定义群集的自定义脚本。HDInsight provides a configuration method called script actions that invokes custom scripts to customize the cluster.

备注

有关如何使用脚本操作的详细信息见此处:使用脚本操作自定义基于 Linux 的 HDInsight 群集More information on how to use script actions can be found here: Customize Linux-based HDInsight clusters using script actions

执行脚本操作Execute Script Actions

可按如下所示在给定的群集上执行脚本操作:You can execute script actions on a given cluster like so:

RuntimeScriptAction scriptAction1 = new RuntimeScriptAction()
    .withName("<Script Name>")
    .withUri("<URL To Script>")
    .withRoles(<List<String> of roles>);
client.clusters().executeScriptActions(
    resourceGroupName, 
    clusterName, 
    new ExecuteScriptActionParameters().withPersistOnSuccess(false).withScriptActions(new LinkedList<>(Arrays.asList(scriptAction1)))); //add more RuntimeScriptActions to the list to execute multiple scripts

删除脚本操作Delete Script Action

删除给定群集上指定的持久化脚本操作:To delete a specified persisted script action on a given cluster:

client.scriptActions().delete("<Resource Group Name>", "<Cluster Name>", "<Script Name>");

列出持久化脚本操作List Persisted Script Actions

备注

listByCluster() 返回 PagedList<RuntimeScriptActionDetailInner> 对象。Both listByCluster() returns a PagedList<RuntimeScriptActionDetailInner> object. 调用 currentPage().items() 会返回 RuntimeScriptActionDetailInner 的列表,而 loadNextPage() 会进入下一页。Calling currentPage().items() returns a list of RuntimeScriptActionDetailInner, and loadNextPage() advances to the next page. 此操作可以一直重复,直至 hasNextPage() 返回 false,这表明没有其他页。This can be repeated until hasNextPage() returns false, indicating that there are no more pages.

列出指定群集的所有持久化脚本操作:To list all persisted script actions for the specified cluster:

client.scriptActions().listByCluster("<Resource Group Name>", "<Cluster Name>");

示例Example

PagedList<RuntimeScriptActionDetailInner> scriptsPaged = client.scriptActions().listByCluster(resourceGroupName, clusterName);
while (true) {
    for (RuntimeScriptActionDetailInner script : scriptsPaged.currentPage().items()) {
        System.out.println(script.name()); //There are methods to get other properties of RuntimeScriptActionDetail besides name(), such as status(), operation(), startTime(), endTime(), etc. See reference documentation.
    }
    if (scriptsPaged.hasNextPage()) {
        scriptsPaged.loadNextPage();
    } else {
        break;
    }
}

列出所有脚本的执行历史记录List All Scripts' Execution History

列出指定群集的所有脚本的执行历史记录:To list all scripts' execution history for the specified cluster:

client.scriptExecutionHistorys().listByCluster("<Resource Group Name>", "<Cluster Name>");

示例Example

此示例列显以往所有脚本执行活动的所有详细信息。This example prints all the details for all past script executions.

PagedList<RuntimeScriptActionDetailInner> scriptExecutionsPaged = client.scriptExecutionHistorys().listByCluster(resourceGroupName, clusterName);
while (true) {
    for (RuntimeScriptActionDetailInner script : scriptExecutionsPaged.currentPage().items()) {
        System.out.println(script.name()); //There are methods to get other properties of RuntimeScriptActionDetail besides name(), such as status(), operation(), startTime(), endTime(), etc. See reference documentation.
    }
    if (scriptExecutionsPaged.hasNextPage()) {
        scriptExecutionsPaged.loadNextPage();
    } else {
        break;
    }
}