The new Microsoft Azure HDInsight Management sdk

Microsoft Azure HDInsight is now a part of Azure Resource Manager as we start to offer Linux based clusters. Read the announcement that came along with the new portal experience. The preview versions of the new SDKs are now available as NuGet packages here:

I am an engineer who loves automation because once you do it right once and you reap the benefits daily. That's why when someone tells me they deployed a HDInsight cluster from Azure Portal and keep managing by going there back and forth again, well, that does not impress me. As someone on the team who has written lots of test frameworks, automation tools in the past, I can't be productive if I have to type numerous things over and over again when using portal or PowerShell CmdLets. 

In this post, I will try to cover on how to start automating using the new Microsoft.Azure.Management.HDInsight SDK and provide you a tool that solves some of tasks related to cluster management (creation, deletion etc). I will not be covering the Microsoft.Azure.Management.HDInsight.Job SDK in this post.

 

Tools & Scripts:

Time to introduce my new GitHub repository where I will post all the tools and code related to HDInsight that I will talk on this blog: https://github.com/rtandonmsft/hdinsight-helper-tools

First tool: HDInsightManagementCLI - https://github.com/rtandonmsft/hdinsight-helper-tools/tree/master/src/HDInsightManagementCLI - this command line tool that allows you to manage your HDInsight clusters using the new SDK, get your subscription capabilities, resize your cluster etc. If you following the link from the GitHub repository to AppVeyor, you should be able to download the compiled binaries from latest build's artifacts: https://ci.appveyor.com/project/rtandonmsft/hdinsight-helper-tools

Automation Scripts can also be created using Azure PowerShell. The first version of these scripts used by our Storm examples that manage HDInsight clusters via the Azure Service Management (ASM) can be found here: https://github.com/hdinsight/hdinsight-storm-examples/tree/master/scripts/azure/HDInsight. I will be working on getting newer version of the PowerShell scripts up that are compatible with Azure Resource Manager (ARM).

 

Things to note:

  • Azure HDInsight has moved to the Azure Resource Manager model, starting with the launch of Linux offerings of different cluster types: Hadoop, Storm, HBase, etc. This is partly because the Linux clusters are deployed as IaaS VMs compared to the Windows based PaaS VMs of the past. You will now treat an HDInsight cluster as a resource, managed through Azure Resource Manager.
 static void CreateResourceGroup(string azureResourceManagementUri, string resourceGroupName, string location)
{
var resourceClient = new ResourceManagementClient(tokenCloudCredentials, new Uri(azureResourceManagementUri));
resourceClient.ResourceGroups.CreateOrUpdate(
resourceGroupName,
new ResourceGroup
{
Location = location,
});

resourceClient.Providers.Register("Microsoft.HDInsight");
}
  • Certificate-based credentials won't work anymore, due to the new emphasis on Azure Active Directory authentication for all services in the Azure Resource Manager framework. This was the something I had to get used to: The move to Azure Active Directory for authentication. Don't worry, I now have code for you to use. I still don't like the pop-up for credentials. (note-to-self: figure out completely unattended mode)
 AuthenticationContext ac = new AuthenticationContext("https://login.windows.net/" + config.AzureActiveDirectoryTenantId, true);
var token = ac.AcquireToken("https://management.core.windows.net/", config.AzureActiveDirectoryClientId, new Uri("urn:ietf:wg:oauth:2.0:oob"), PromptBehavior.Auto);
tokenCloudCredentials = new TokenCloudCredentials(config.SubscriptionId, token.AccessToken);
hdInsightManagementClient = new HDInsightManagementClient(tokenCloudCredentials, new Uri(https://management.azure.com/));
  • If you are looking to deploy any new clusters (including Linux clusters) through the new SDK, new PowerShell cmdlets, or new portal, then you will need to switch to the new SDK, new PowerShell cmdlets, and/or new portal to continue to manage the cluster.

 

Workflow:

There are two parts to understanding this workflow, the first part is how you as the client/user communicates with Azure to create your resources (clusters) and the second part is how HDInsight works with Azure to deploy the clusters.

You can use Azure Resource Manager (ARM / New) or Azure Service Management (ASM / Old) to communicate with HDInsight cluster management APIs. HDInsight communicates with Azure Compute to procure the right type of VMs for your cluster: PaaS for Windows (via RDFE), IaaS for Linux (via ARM using template deployment).

Compatibility Table:

 

Manageable with ASM / Old

(old SDK, old PowerShell, or old Portal)

Manageable with ARM / New

(new SDK, new PowerShell, or new Portal)

Clusters created through ASM / Old

(old SDK, old PowerShell, or old Portal)

Yes

Yes

Clusters created through ARM / New

(new SDK, new PowerShell, or new Portal)

No

Yes

 

To conclude, if you created your cluster with the old SDK, old PowerShell, or old portal, you can continue using the old SDK. If you are going to create new clusters, including Linux, through the new tools, you should invest your time in moving to the new SDK.

I also have another management tool like HDInsightManagementCLI (now posted on GitHub) which uses the old SDK and talks to Azure Service Management to manage HDInsight clusters. If there is enough interest, I will publish that too.

Cheers
Ravi

Next post: HDInsight Cluster Logs Downloader

Read more about Azure Resource Manager (ARM) here: