您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

用于 .NET 的 Azure HDInsight SDKAzure HDInsight SDK for .NET

Azure HDInsight 提供了用于 .NET 的管理和作业 SDK,这些 SDK 提供用于管理 HDInsight 群集以及提交和监视 Hadoop 作业的类。Azure HDInsight offers management and job SDKs for .NET that provide classes for managing your HDInsight cluster and submitting and monitoring Hadoop jobs.

管理Management

用于 .NET 的 HDInsight 管理 SDK 提供用于管理 HDInsight 群集的类和方法。The HDInsight management SDK for .NET provides classes and methods that allow you to manage your HDInsight clusters. 该 SDK 包含用于创建、删除、更新、列出、调整大小、执行脚本操作,以及监视、获取 HDInsight 群集属性等操作。It includes operations to create, delete, update, list, resize, execute script actions, monitor, get properties of HDInsight clusters, and more.

先决条件Prerequisites

SDK 安装SDK Installation

在 Visual Studio 项目中,依次单击“工具”、“NuGet 包管理器”、“包管理器控制台”打开包管理器控制台。From your Visual Studio project, open the Package Manager Console by clicking Tools, NuGet Package Manager, and then click Package Manager Console.

在包管理器控制台中执行以下命令:In the Package Manager Console, execute the following commands:

  Install-Package Microsoft.Azure.Management.HDInsight
  Install-Package Microsoft.Azure.Management.Fluent
  Install-Package Microsoft.Azure.Management.ResourceManager.Fluent

AuthenticationAuthentication

首先需要使用 Azure 订阅对该 SDK 进行身份验证。The SDK first needs to be authenticated with your Azure subscription. 请遵循以下示例创建服务主体,然后使用该服务主体进行身份验证。Follow the example below to create a service principal and use it to authenticate. 完成此操作后,将会获得 HDInsightManagementClient 的实例,其中包含可用于执行管理操作的多个方法(以下部分将概述这些方法)。After this is done, you will have an instance of an HDInsightManagementClient, which contains many methods (outlined in below sections) that can be used to perform management operations.

备注

除了以下示例中所示的方法以外,还有其他一些身份验证方法可能更符合你的需要。There are other ways to authenticate besides the below example that could potentially be better suited for your needs. 此处概述了所有方法:使用用于 .NET 的 Azure 库进行身份验证All methods are outlined here: Authenticate with the Azure Libraries for .NET

使用服务主体的身份验证示例Authentication Example Using a Service Principal

首先登录到 Azure Cloud ShellFirst, login to Azure Cloud Shell. 验证当前使用的是要在其中创建服务主体的订阅。Verify you are currently using the subscription in which you want the service principal created.

az account show

订阅信息将显示为 JSON。Your subscription information is displayed as JSON.

{
  "environmentName": "AzureCloud",
  "id": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
  "isDefault": true,
  "name": "XXXXXXX",
  "state": "Enabled",
  "tenantId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
  "user": {
    "cloudShellID": true,
    "name": "XXX@XXX.XXX",
    "type": "user"
  }
}

如果尚未登录到正确的订阅,请运行以下命令选择正确的订阅:If you're not logged into the correct subscription, select the correct one by running:

az account set -s <name or ID of subscription>

重要

如果尚未通过其他方法(例如,通过 Azure 门户创建 HDInsight 群集)注册 HDInsight 资源提供程序,则需要先执行此操作一次,然后才能进行身份验证。If you have not already registered the HDInsight Resource Provider by another method (such as by creating an HDInsight Cluster through the Azure Portal), you need to do this once before you can authenticate. 可以在 Azure Cloud Shell 中运行以下命令来完成此操作:This can be done from the Azure Cloud Shell by running the following command:

az provider register --namespace Microsoft.HDInsight

接下来,选择服务主体的名称,然后使用以下命令创建服务主体:Next, choose a name for your service principal and create it with the following command:

az ad sp create-for-rbac --name <Service Principal Name> --sdk-auth

服务主体信息将以 JSON 格式显示。The service principal information is displayed as JSON.

{
  "clientId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
  "clientSecret": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
  "subscriptionId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
  "tenantId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
  "activeDirectoryEndpointUrl": "https://login.microsoftonline.com",
  "resourceManagerEndpointUrl": "https://management.azure.com/",
  "activeDirectoryGraphResourceId": "https://graph.windows.net/",
  "sqlManagementEndpointUrl": "https://management.core.windows.net:8443/",
  "galleryEndpointUrl": "https://gallery.azure.com/",
  "managementEndpointUrl": "https://management.core.windows.net/"
}

复制以下代码片段,并在 TENANT_IDCLIENT_IDCLIENT_SECRETSUBSCRIPTION_ID 中填写运行创建服务主体的命令后返回的 JSON 中的字符串。Copy the below snippet and fill in TENANT_ID, CLIENT_ID, CLIENT_SECRET, and SUBSCRIPTION_ID with the strings from the JSON that was returned after running the command to create the service principal.

using Microsoft.Azure.Management.HDInsight;
using Microsoft.Azure.Management.HDInsight.Models;
using Microsoft.Azure.Management.ResourceManager.Fluent;

namespace HDI_SDK_Test
{
    class Program
    {
        static void Main(string[] args)
        {
            // Tenant ID for your Azure Subscription
            var TENANT_ID = "";
            // Your Service Principal App Client ID
            var CLIENT_ID = "";
            // Your Service Principal Client Secret
            var CLIENT_SECRET = "";
            // Azure Subscription ID
            var SUBSCRIPTION_ID = "";

            var credentials = SdkContext.AzureCredentialsFactory
                .FromServicePrincipal(
                CLIENT_ID,
                CLIENT_SECRET,
                TENANT_ID,
                AzureEnvironment.AzureGlobalCloud);

            var client = new HDInsightManagementClient(credentials);
            client.SubscriptionId = SUBSCRIPTION_ID;
        }
    }
}

群集管理Cluster Management

备注

本部分假设你已完成身份验证,已构造 HDInsightManagementClient 实例并已将其存储在名为 client 的变量中。This section assumes you have already authenticated and constructed an HDInsightManagementClient instance and store it in a variable called client. 在前面的“身份验证”部分可以找到有关身份验证和获取 HDInsightManagementClient 的说明。Instructions for authenticating and obtaining an HDInsightManagementClient can be found in the Authentication section above.

创建群集Create a Cluster

可以通过调用 client.Clusters.Create() 来创建新群集。A new cluster can be created by calling client.Clusters.Create().

示例Samples

用于创建几个常见类型的 HDInsight 群集的代码示例可供使用:HDInsight .NET 示例Code samples for creating several common types of HDInsight clusters are available: HDInsight .NET Samples.

示例Example

本示例演示如何创建包含 2 个头节点和 1 个工作节点的 Spark 群集。This example demonstrates how to create a Spark cluster with 2 head nodes and 1 worker node.

备注

首先需要创建一个资源组和存储帐户,下面将予以介绍。You first need to create a Resource Group and Storage Account, as explained below. 如果已创建资源组和存储帐户,则可以跳过这些步骤。If you have already created these, you can skip these steps.

创建资源组Creating a Resource Group

可以在 Azure Cloud Shell 中运行以下命令来创建资源组You can create a resource group using the Azure Cloud Shell by running

az group create -l <Region Name (i.e. eastus)> --n <Resource Group Name>
创建存储帐户Creating a Storage Account

可以在 Azure Cloud Shell 中运行以下命令来创建存储帐户You can create a storage account using the Azure Cloud Shell by running:

az storage account create -n <Storage Account Name> -g <Existing Resource Group Name> -l <Region Name (i.e. eastus)> --sku <SKU i.e. Standard_LRS>

现在,运行以下命令获取存储帐户的密钥(创建群集时需要用到):Now run the following command to get the key for your storage account (you will need this to create a cluster):

az storage account keys list -n <Storage Account Name>

以下 .NET 代码片段创建包含 2 个头节点和 1 个工作节点的 Spark 群集。The below .NET snippet creates a Spark cluster with 2 head nodes and 1 worker node. 按照注释中所述填写空白变量,并根据具体的需要任意更改其他参数。Fill in the blank variables as explained in the comments and feel free to change other parameters to suit your specific needs.

// The name for the cluster you are creating
var clusterName = "";
// The name of your existing Resource Group
var resourceGroupName = "";
// Choose a username
var username = "";
// Choose a password
var password = "";
// Replace <> with the name of your storage account
var storageAccount = "<>.blob.core.windows.net";
// Storage account key you obtained above
var storageAccountKey = "";
// Choose a region
var location = "";
var container = "default";

var parameters = new ClusterCreateParametersExtended
{
    Location = location,
    Tags = new Dictionary<string, string>(),
    Properties = new ClusterCreateProperties
    {
        ClusterVersion = "3.6",
        OsType = OSType.Linux,
        ClusterDefinition = new ClusterDefinition
        {
            Kind = "Hadoop",            
            Configurations = new Dictionary<string, Dictionary<string, string>>()
            {                
                { "gateway", new Dictionary<string, string>
                    {
                        { "restAuthCredential.isEnabled", "true" },
                        { "restAuthCredential.username", username},
                        { "restAuthCredential.password", password}
                    }
                }
            }
        },
        Tier = Tier.Standard,
        ComputeProfile = new ComputeProfile
        {
            Roles = new List<Role>{
                new Role
                {
                    Name = "headnode",
                    TargetInstanceCount = 2,
                    HardwareProfile = new HardwareProfile
                    {
                        VmSize = "Large"
                    },
                    OsProfile = new OsProfile
                    {
                        LinuxOperatingSystemProfile = new LinuxOperatingSystemProfile
                        {
                            Username = username,
                            Password = password
                        }
                    }
                },
                new Role
                {
                    Name = "workernode",
                    TargetInstanceCount = 1,
                    HardwareProfile = new HardwareProfile
                    {
                        VmSize = "Large"
                    },
                    OsProfile = new OsProfile
                    {
                        LinuxOperatingSystemProfile = new LinuxOperatingSystemProfile
                        {
                            Username = username,
                            Password = password
                        }
                    }
                },
            }
        },
        StorageProfile = new StorageProfile
        {
            Storageaccounts = new[]
            {
                new StorageAccount
                {
                    Name = storageAccount,
                    Key = storageAccountKey,
                    Container = container,
                    IsDefault = true
                }
            }
        }
    }
};
client.Clusters.Create(
    resourceGroupName,
    clusterName,
    parameters
);

获取群集详细信息Get Cluster Details

获取给定群集的属性:To get properties for a given cluster:

client.Clusters.Get("<Resource Group Name>", "<Cluster Name>");

示例Example

可以使用 get 来确认已成功创建群集。You can use get to confirm that you have successfully created your cluster.

var myCluster = client.Clusters.Get("<Resource Group Name>", "<Cluster Name>");
Debug.WriteLine(myCluster.Name); //Prints the name of the cluster
Debug.WriteLine(myCluster.Id) //Prints the resource Id of the cluster

输出应如下所示:The output should look like:

<Cluster Name>
/subscriptions/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/resourceGroups/<Resource Group Name>/providers/Microsoft.HDInsight/clusters/<Cluster Name>

备注

存储在变量 myCluster 中的 get 返回值的类型为 Microsoft.Azure.Management.HDInsight.ModelsClusterThe return value of get, stored in variable myCluster, is of type Microsoft.Azure.Management.HDInsight.ModelsCluster. 可在此处找到此对象的属性的完整列表。A full list of this object's properties can be found here.

列出群集List Clusters

列出订阅下的群集List Clusters Under The Subscription

client.Clusters.List();

按资源组列出群集List Clusters By Resource Group

client.Clusters.ListByResourceGroup("<Resource Group Name>");

备注

List()ListByResourceGroup() 都返回 IPage<Cluster> 对象。Both List() and ListByResourceGroup() return an IPage<Cluster> object. 若要获取下一个页面,可以调用 client.Clusters.ListNext("Next Page Link")To get the next page, you can call client.Clusters.ListNext("Next Page Link"). 可以反复执行此调用,直到 NextPageLinknull,如以下示例中所示。This can be repeated until NextPageLink is null, as shown in the example below.

示例Example

以下示例列显当前订阅的所有群集的属性:The following example prints the properties of all clusters for the current subscription:

var clustersPaged = client.Clusters.List();
while (true)
{
  foreach (var cluster in clustersPaged)
  {
    Debug.WriteLine(cluster.Name);
  
}  if (clustersPaged.NextPageLink == null)
  {
    break;
  }
  clustersPaged = client.Clusters.ListNext(clustersPaged.NextPageLink);
}

删除群集Delete a Cluster

删除群集:To delete a cluster:

client.Clusters.Delete("<Resource Group Name>", "<Cluster Name>");

更新群集标记Update Cluster Tags

可按如下所示更新给定群集的标记:You can update the tags of a given cluster like so:

client.Clusters.Update("<Resource Group Name>", "<Cluster Name>", new ClusterPatchParameters(<Dictionary of Tags>));

示例Example

client.Clusters.Update("<Resource Group Name>", "<Cluster Name>", new ClusterPatchParameters(new Dictionary<string, string> { { "tag1Name", "tag1Value" }, { "tag2Name", "tag2Value" } }));

调整群集大小Resize Cluster

可以通过指定新大小来调整给定群集的工作节点数,如下所示:You can resize a given cluster's number of worker nodes by specifying a new size like so:

client.Clusters.Resize("<Resource Group Name>", "<Cluster Name>", <Num of Worker Nodes (int)>)

群集监视Cluster Monitoring

使用 HDInsight 管理 SDK 还可以通过 Operations Management Suite (OMS) 来管理群集的监视。The HDInsight Management SDK can also be used to manage monitoring on your clusters via the Operations Management Suite (OMS).

启用 OMS 监视Enable OMS Monitoring

备注

若要启用 OMS 监视,必须已有一个 Log Analytics 工作区。To enable OMS Monitoring, you must have an existing Log Analytics workspace. 如果尚未创建工作区,可在此了解创建方法:在 Azure 门户中创建 Log Analytics 工作区If you have not already created one, you can learn how to do that here: Create a Log Analytics workspace in the Azure portal.

在群集上启用 OMS 监视:To enable OMS Monitoring on your cluster:

client.Extension.EnableMonitoring("<Resource Group Name", "Cluster Name", new ClusterMonitoringRequest(workspaceId: "<Workspace Id>"));

查看 OMS 监视状态View Status Of OMS Monitoring

获取群集上的 OMS 状态:To get the status of OMS on your cluster:

client.Extension.GetMonitoringStatus("<Resource Group Name", "Cluster Name");

禁用 OMS 监视Disable OMS Monitoring

在群集上禁用 OMS:To disable OMS on your cluster:

client.Extension.DisableMonitoring("<Resource Group Name>", "<Cluster Name>");

脚本操作Script Actions

HDInsight 提供一个称为“脚本操作”的配置方法,该方法可调用用于自定义群集的自定义脚本。HDInsight provides a configuration method called script actions that invokes custom scripts to customize the cluster.

备注

有关如何使用脚本操作的详细信息见此处:使用脚本操作自定义基于 Linux 的 HDInsight 群集More information on how to use script actions can be found here: Customize Linux-based HDInsight clusters using script actions

执行脚本操作Execute Script Actions

可按如下所示在给定的群集上执行脚本操作:You can execute script actions on a given cluster like so:

var scriptAction1 = new RuntimeScriptAction("<Script Name>", "<URL To Script>", <List<string> of roles>); //valid roles are "headnode", "workernode", "zookeepernode", and "edgenode"

client.Clusters.ExecuteScriptActions("<Resource Group Name>", "<Cluster Name>", new List<RuntimeScriptAction> { scriptAction1 }, <persistOnSuccess (bool)>); //add more RuntimeScriptActions to the list to execute multiple scripts

删除脚本操作Delete Script Action

删除给定群集上指定的持久化脚本操作:To delete a specified persisted script action on a given cluster:

client.ScriptActions.Delete("<Resource Group Name>", "<Cluster Name>", "<Script Name>");

列出持久化脚本操作List Persisted Script Actions

备注

ListPersistedScripts()List() 返回 IPage<RuntimeScriptActionDetail> 对象。ListPersistedScripts() and List() return an IPage<RuntimeScriptActionDetail> object. 若要获取下一个页面,可以调用 client.ScriptActions.ListPersistedScriptsNext("Next Page Link")client.ScriptExecutionHistory.ListNext("Next Page Link")To get the next page, you can call client.ScriptActions.ListPersistedScriptsNext("Next Page Link") or client.ScriptExecutionHistory.ListNext("Next Page Link"). 可以反复执行此调用,直到 NextPageLinknull,如以下示例中所示。This can be repeated until NextPageLink is null, as shown in the examples below.

列出指定群集的所有持久化脚本操作:To list all persisted script actions for the specified cluster:

client.ScriptActions.ListPersistedScripts("<Resource Group Name>", "<Cluster Name>");

示例Example

var scriptsPaged = client.ScriptActions.ListPersistedScripts("<Resource Group Name>", "<Cluster Name>");
while (true)
{
    foreach (var script in scriptsPaged)
    {
        Debug.WriteLine(script.Name); //There are other properties of RuntimeScriptActionDetail besides Name, such as Status, Operation, StartTime, EndTime, etc. See reference documentation.
    }
    if (scriptsPaged.NextPageLink == null)
    {
        break;
    }
    scriptsPaged = client.ScriptActions.ListPersistedScriptsNext(scriptsPaged.NextPageLink);
}

列出所有脚本的执行历史记录List All Scripts' Execution History

列出指定群集的所有脚本的执行历史记录:To list all scripts' execution history for the specified cluster:

client.script_execution_history.list("<Resource Group Name>", "<Cluster Name>");

示例Example

此示例列显以往所有脚本执行活动的所有详细信息。This example prints all the details for all past script executions.

var scriptExecutionsPaged = client.ScriptExecutionHistory.List("<Resource Group Name>", "<Cluster Name>");
while (true)
{
    foreach (var script in scriptExecutionsPaged)
    {
        Debug.WriteLine(script.Name); //There are other properties of RuntimeScriptActionDetail besides Name, such as Status, Operation, StartTime, EndTime, etc. See reference documentation.

    }
    if (scriptExecutionsPaged.NextPageLink == null)
    {
        break;
    }
    scriptExecutionsPaged = client.ScriptExecutionHistory.ListNext(scriptExecutionsPaged.NextPageLink);
}

作业Jobs

使用用于 .NET 的 Azure HDInsight 作业 SDK 可创建、管理和监视 Hadoop 群集上的作业。Use the Azure HDInsight job SDK for .NET to create, manage, and monitor jobs on a Hadoop cluster.

SDK 安装SDK Installation

直接从 Visual Studio [包管理器控制台][PackageManager] 或使用 [.NET Core CLI][DotNetCLI] 安装 NuGet 包Install the NuGet package directly from the Visual Studio [Package Manager console][PackageManager] or with the [.NET Core CLI][DotNetCLI].

Visual Studio 包管理器Visual Studio Package Manager

Install-Package Microsoft.Azure.Management.HDInsight.Job
dotnet add package Microsoft.Azure.Management.HDInsight.Job

代码示例Code Example

此示例在 Hadoop 群集中运行 Hive 作业。This example runs a Hive job in a Hadoop cluster.

HDInsightJobManagementClient managementClient = new HDInsightJobManagementClient(clusterUri, credentials);

Dictionary<string, string> defines = new Dictionary<string, string> {
    { "hive.execution.engine", "tez" },
    { "hive.exec.reducers.max", "1" }
};
List<string> arguments = new List<string> { { "argA" }, { "argB" } };
HiveJobSubmissionParameters parameters = new HiveJobSubmissionParameters
{
    Query = "SHOW TABLES",
    Defines = defines,
    Arguments = arguments
};

JobSubmissionResponse jobResponse = managementClient.JobManagement.SubmitHiveJob(parameters);

示例Samples