Azure HDInsight .NET SDK

Azure HDInsight libraries for .NET 2.X

Overview

The HDInsight Service .NET SDK provides classes that relate to the creation, configuration, submission, and monitoring of Hadoop jobs managed by an Azure HDInsight Service. In addition, it provides classes to manage Azure subscriptions using the HDInsight Service and to configure the clusters, storage accounts, and other assets associated with the HDInsight clusters that are managed by an Azure subscription.

Management libraries

Jobs

Use the Azure HDInsight client SDK to create, manage, and monitor jobs on a Hadoop cluster.

Install the NuGet package directly from the Visual Studio Package Manager console or with the .NET Core CLI.

Visual Studio Package Manager

Install-Package Microsoft.Azure.Management.HDInsight.Job
dotnet add package Microsoft.Azure.Management.HDInsight.Job

Code Example

This example runs a Hive job in a Hadoop cluster.

HDInsightJobManagementClient managementClient = new HDInsightJobManagementClient(clusterUri, credentials);

Dictionary<string, string> defines = new Dictionary<string, string> {
    { "hive.execution.engine", "tez" },
    { "hive.exec.reducers.max", "1" }
};
List<string> arguments = new List<string> { { "argA" }, { "argB" } };
HiveJobSubmissionParameters parameters = new HiveJobSubmissionParameters
{
    Query = "SHOW TABLES",
    Defines = defines,
    Arguments = arguments
};

JobSubmissionResponse jobResponse = managementClient.JobManagement.SubmitHiveJob(parameters);

HDInsight

Use the Azure HDInsight management SDK to create, manage, start, stop, and scale Hadoop clusters.

Install the NuGet package directly from the Visual Studio Package Manager console or with the .NET Core CLI.

Visual Studio Package Manager

Install-Package Microsoft.Azure.Management.HDInsight
dotnet add package Microsoft.Azure.Management.HDInsight

Code Example

This example creates an HDInsight two node Linux Hadoop cluster with an existing Azure Blob Storage.

HDInsightManagementClient managementClient = new HDInsightManagementClient(authToken);
// Set parameters for the new cluster
ClusterCreateParameters parameters = new ClusterCreateParameters
{
    ClusterSizeInNodes = 2,
    UserName = "admin",
    Password = "<Enter HTTP User Password>",
    ClusterType = "Hadoop",
    OSType = OSType.Linux,
    Version = "3.5",
    // Use an Azure storage account as the default storage
    DefaultStorageInfo = new AzureStorageInfo("<StorageAccount>", "<StorageKey>", "<BlobContainerName>"),
    Location = "EAST US 2",
    SshUserName = "sshuser",
    SshPassword = "<Enter SSH User Password>",
};

// Create the cluster
managementClient.Clusters.Create("<ExistingResourceGroupName>", "<NewClusterName>", parameters);

Samples

View the complete list of Azure SQL Database samples.

HDInsight .NET Management SDK 3.X Preview

Overview

The HDInsight .NET SDK provides classes and methods that allow you to manage your HDInsight clusters. It includes operations to create, delete, update, list, resize, execute script actions, monitor, get properties of HDInsight clusters, and more.

Prerequisites

SDK Installation

From your Visual Studio project, open the Package Manager Console by clicking Tools, NuGet Package Manager, and then click Package Manager Console.

In the Package Manager Console, execute the following commands:

  Install-Package Microsoft.Azure.Management.HDInsight -Version 3.1.0-preview
  Install-Package Microsoft.Azure.Management.Fluent
  Install-Package Microsoft.Azure.Management.ResourceManager.Fluent

Authentication

The SDK first needs to be authenticated with your Azure subscription. Follow the example below to create a service principal and use it to authenticate. After this is done, you will have an instance of an HDInsightManagementClient, which contains many methods (outlined in below sections) that can be used to perform management operations.

Note

There are other ways to authenticate besides the below example that could potentially be better suited for your needs. All methods are outlined here: Authenticate with the Azure Libraries for .NET

Authentication Example Using a Service Principal

First, login to Azure Cloud Shell. Verify you are currently using the subscription in which you want the service principal created.

az account show

Your subscription information is displayed as JSON.

{
  "environmentName": "AzureCloud",
  "id": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
  "isDefault": true,
  "name": "XXXXXXX",
  "state": "Enabled",
  "tenantId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
  "user": {
    "cloudShellID": true,
    "name": "XXX@XXX.XXX",
    "type": "user"
  }
}

If you're not logged into the correct subscription, select the correct one by running:

az account set -s <name or ID of subscription>

Important

If you have not already registered the HDInsight Resource Provider by another method (such as by creating an HDInsight Cluster through the Azure Portal), you need to do this once before you can authenticate. This can be done from the Azure Cloud Shell by running the following command:

az provider register --namespace Microsoft.HDInsight

Next, choose a name for your service principal and create it with the following command:

az ad sp create-for-rbac --name <Service Principal Name> --sdk-auth

The service principal information is displayed as JSON.

{
  "clientId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
  "clientSecret": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
  "subscriptionId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
  "tenantId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
  "activeDirectoryEndpointUrl": "https://login.microsoftonline.com",
  "resourceManagerEndpointUrl": "https://management.azure.com/",
  "activeDirectoryGraphResourceId": "https://graph.windows.net/",
  "sqlManagementEndpointUrl": "https://management.core.windows.net:8443/",
  "galleryEndpointUrl": "https://gallery.azure.com/",
  "managementEndpointUrl": "https://management.core.windows.net/"
}

Copy the below snippet and fill in TENANT_ID, CLIENT_ID, CLIENT_SECRET, and SUBSCRIPTION_ID with the strings from the JSON that was returned after running the command to create the service principal.

using Microsoft.Azure.Management.HDInsight;
using Microsoft.Azure.Management.HDInsight.Models;
using Microsoft.Azure.Management.ResourceManager.Fluent;

namespace HDI_SDK_Test
{
    class Program
    {
        static void Main(string[] args)
        {
            // Tenant ID for your Azure Subscription
            var TENANT_ID = "";
            // Your Service Principal App Client ID
            var CLIENT_ID = "";
            // Your Service Principal Client Secret
            var CLIENT_SECRET = "";
            // Azure Subscription ID
            var SUBSCRIPTION_ID = "";

            var credentials = SdkContext.AzureCredentialsFactory
                .FromServicePrincipal(
                CLIENT_ID,
                CLIENT_SECRET,
                TENANT_ID,
                AzureEnvironment.AzureGlobalCloud);

            var client = new HDInsightManagementClient(credentials);
            client.SubscriptionId = SUBSCRIPTION_ID;
        }
    }
}

Cluster Management

Note

This section assumes you have already authenticated and constructed an HDInsightManagementClient instance and store it in a variable called client. Instructions for authenticating and obtaining an HDInsightManagementClient can be found in the Authentication section above.

Create a Cluster

A new cluster can be created by calling client.Clusters.Create().

Example

This example demonstrates how to create a Spark cluster with 2 head nodes and 1 worker node.

Note

You first need to create a Resource Group and Storage Account, as explained below. If you have already created these, you can skip these steps.

Creating a Resource Group

You can create a resource group using the Azure Cloud Shell by running

az group create -l <Region Name (i.e. eastus)> --n <Resource Group Name>
Creating a Storage Account

You can create a storage account using the Azure Cloud Shell by running:

az storage account create -n <Storage Account Name> -g <Existing Resource Group Name> -l <Region Name (i.e. eastus)> --sku <SKU i.e. Standard_LRS>

Now run the following command to get the key for your storage account (you will need this to create a cluster):

az storage account keys list -n <Storage Account Name>

The below .NET snippet creates a Spark cluster with 2 head nodes and 1 worker node. Fill in the blank variables as explained in the comments and feel free to change other parameters to suit your specific needs.

// The name for the cluster you are creating
var clusterName = "";
// The name of your existing Resource Group
var resourceGroupName = "";
// Choose a username
var username = "";
// Choose a password
var password = "";
// Replace <> with the name of your storage account
var storageAccount = "<>.blob.core.windows.net";
// Storage account key you obtained above
var storageAccountKey = "";
// Choose a region
var location = "";
var container = "default";

var parameters = new ClusterCreateParametersExtended
{
    Location = location,
    Tags = new Dictionary<string, string>(),
    Properties = new ClusterCreateProperties
    {
        ClusterVersion = "3.6",
        OsType = OSType.Linux,
        ClusterDefinition = new ClusterDefinition
        {
            Kind = "Hadoop",            
            Configurations = new Dictionary<string, Dictionary<string, string>>()
            {                
                { "gateway", new Dictionary<string, string>
                    {
                        { "restAuthCredential.isEnabled", "true" },
                        { "restAuthCredential.username", username},
                        { "restAuthCredential.password", password}
                    }
                }
            }
        },
        Tier = Tier.Standard,
        ComputeProfile = new ComputeProfile
        {
            Roles = new List<Role>{
                new Role
                {
                    Name = "headnode",
                    TargetInstanceCount = 2,
                    HardwareProfile = new HardwareProfile
                    {
                        VmSize = "Large"
                    },
                    OsProfile = new OsProfile
                    {
                        LinuxOperatingSystemProfile = new LinuxOperatingSystemProfile
                        {
                            Username = username,
                            Password = password
                        }
                    }
                },
                new Role
                {
                    Name = "workernode",
                    TargetInstanceCount = 1,
                    HardwareProfile = new HardwareProfile
                    {
                        VmSize = "Large"
                    },
                    OsProfile = new OsProfile
                    {
                        LinuxOperatingSystemProfile = new LinuxOperatingSystemProfile
                        {
                            Username = username,
                            Password = password
                        }
                    }
                },
            }
        },
        StorageProfile = new StorageProfile
        {
            Storageaccounts = new[]
            {
                new StorageAccount
                {
                    Name = storageAccount,
                    Key = storageAccountKey,
                    Container = container,
                    IsDefault = true
                }
            }
        }
    }
};
client.Clusters.Create(
    resourceGroupName,
    clusterName,
    parameters
);

Get Cluster Details

To get properties for a given cluster:

client.Clusters.Get("<Resource Group Name>", "<Cluster Name>");

Example

You can use get to confirm that you have successfully created your cluster.

var myCluster = client.Clusters.Get("<Resource Group Name>", "<Cluster Name>");
Debug.WriteLine(myCluster.Name); //Prints the name of the cluster
Debug.WriteLine(myCluster.Id) //Prints the resource Id of the cluster

The output should look like:

<Cluster Name>
/subscriptions/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/resourceGroups/<Resource Group Name>/providers/Microsoft.HDInsight/clusters/<Cluster Name>

Note

The return value of get, stored in variable myCluster, is of type Microsoft.Azure.Management.HDInsight.ModelsCluster. A full list of this object's properties can be found here.

List Clusters

List Clusters Under The Subscription

client.Clusters.List();

List Clusters By Resource Group

client.Clusters.ListByResourceGroup("<Resource Group Name>");

Note

Both List() and ListByResourceGroup() return an IPage<Cluster> object. To get the next page, you can call client.Clusters.ListNext("Next Page Link"). This can be repeated until NextPageLink is null, as shown in the example below.

Example

The following example prints the properties of all clusters for the current subscription:

var clustersPaged = client.Clusters.List();
while (true)
{
  foreach (var cluster in clustersPaged)
  {
    Debug.WriteLine(cluster.Name);
  
}  if (clustersPaged.NextPageLink == null)
  {
    break;
  }
  clustersPaged = client.Clusters.ListNext(clustersPaged.NextPageLink);
}

Delete a Cluster

To delete a cluster:

client.Clusters.Delete("<Resource Group Name>", "<Cluster Name>");

Update Cluster Tags

You can update the tags of a given cluster like so:

client.Clusters.Update("<Resource Group Name>", "<Cluster Name>", new ClusterPatchParameters(<Dictionary of Tags>));

Example

client.Clusters.Update("<Resource Group Name>", "<Cluster Name>", new ClusterPatchParameters(new Dictionary<string, string> { { "tag1Name", "tag1Value" }, { "tag2Name", "tag2Value" } }));

Resize Cluster

You can resize a given cluster's number of worker nodes by specifying a new size like so:

client.Clusters.Resize("<Resource Group Name>", "<Cluster Name>", <Num of Worker Nodes (int)>)

Cluster Monitoring

The HDInsight Management SDK can also be used to manage monitoring on your clusters via the Operations Management Suite (OMS).

Enable OMS Monitoring

Note

To enable OMS Monitoring, you must have an existing Log Analytics workspace. If you have not already created one, you can learn how to do that here: Create a Log Analytics workspace in the Azure portal.

To enable OMS Monitoring on your cluster:

client.Extension.EnableMonitoring("<Resource Group Name", "Cluster Name", new ClusterMonitoringRequest(workspaceId: "<Workspace Id>"));

View Status Of OMS Monitoring

To get the status of OMS on your cluster:

client.Extension.GetMonitoringStatus("<Resource Group Name", "Cluster Name");

Disable OMS Monitoring

To disable OMS on your cluster:

client.Extension.DisableMonitoring("<Resource Group Name>", "<Cluster Name>");

Script Actions

HDInsight provides a configuration method called script actions that invokes custom scripts to customize the cluster.

Note

More information on how to use script actions can be found here: Customize Linux-based HDInsight clusters using script actions

Execute Script Actions

You can execute script actions on a given cluster like so:

var scriptAction1 = new RuntimeScriptAction("<Script Name>", "<URL To Script>", <List<string> of roles>); //valid roles are "headnode", "workernode", "zookeepernode", and "edgenode"

client.Clusters.ExecuteScriptActions("<Resource Group Name>", "<Cluster Name>", new List<RuntimeScriptAction> { scriptAction1 }, <persistOnSuccess (bool)>); //add more RuntimeScriptActions to the list to execute multiple scripts

Delete Script Action

To delete a specified persisted script action on a given cluster:

client.ScriptActions.Delete("<Resource Group Name>", "<Cluster Name>", "<Script Name>");

List Persisted Script Actions

Note

ListPersistedScripts() and List() return an IPage<RuntimeScriptActionDetail> object. To get the next page, you can call client.ScriptActions.ListPersistedScriptsNext("Next Page Link") or client.ScriptExecutionHistory.ListNext("Next Page Link"). This can be repeated until NextPageLink is null, as shown in the examples below.

To list all persisted script actions for the specified cluster:

client.ScriptActions.ListPersistedScripts("<Resource Group Name>", "<Cluster Name>");

Example

var scriptsPaged = client.ScriptActions.ListPersistedScripts("<Resource Group Name>", "<Cluster Name>");
while (true)
{
    foreach (var script in scriptsPaged)
    {
        Debug.WriteLine(script.Name); //There are other properties of RuntimeScriptActionDetail besides Name, such as Status, Operation, StartTime, EndTime, etc. See reference documentation.
    }
    if (scriptsPaged.NextPageLink == null)
    {
        break;
    }
    scriptsPaged = client.ScriptActions.ListPersistedScriptsNext(scriptsPaged.NextPageLink);
}

List All Scripts' Execution History

To list all scripts' execution history for the specified cluster:

client.script_execution_history.list("<Resource Group Name>", "<Cluster Name>");

Example

This example prints all the details for all past script executions.

var scriptExecutionsPaged = client.ScriptExecutionHistory.List("<Resource Group Name>", "<Cluster Name>");
while (true)
{
    foreach (var script in scriptExecutionsPaged)
    {
        Debug.WriteLine(script.Name); //There are other properties of RuntimeScriptActionDetail besides Name, such as Status, Operation, StartTime, EndTime, etc. See reference documentation.

    }
    if (scriptExecutionsPaged.NextPageLink == null)
    {
        break;
    }
    scriptExecutionsPaged = client.ScriptExecutionHistory.ListNext(scriptExecutionsPaged.NextPageLink);
}