Manage Hadoop clusters in HDInsight by using Azure PowerShell
Azure PowerShell is a powerful scripting environment that you can use to control and automate the deployment and management of your workloads in Azure. In this article, you will learn how to manage Hadoop clusters in Azure HDInsight by using a local Azure PowerShell console through the use of Windows PowerShell. For the list of the HDInsight PowerShell cmdlets, see HDInsight cmdlet reference.
Before you begin this article, you must have the following:
- An Azure subscription. See Get Azure free trial.
Install Azure PowerShell
Azure PowerShell support for managing HDInsight resources using Azure Service Manager is deprecated, and was removed on January 1, 2017. The steps in this document use the new HDInsight cmdlets that work with Azure Resource Manager.
Please follow the steps in Install and configure Azure PowerShell to install the latest version of Azure PowerShell. If you have scripts that need to be modified to use the new cmdlets that work with Azure Resource Manager, see Migrating to Azure Resource Manager-based development tools for HDInsight clusters for more information.
If you have installed Azure PowerShell version 0.9x, you must uninstall it before installing a newer version.
To check the version of the installed PowerShell:
To uninstall the older version, run Programs and Features in the control panel.
Use the following command to list all clusters in the current subscription:
Use the following command to show details of a specific cluster in the current subscription:
Get-AzureRmHDInsightCluster -ClusterName <Cluster Name>
Use the following command to delete a cluster:
Remove-AzureRmHDInsightCluster -ClusterName <Cluster Name>
You can also delete a cluster by removing the resource group that contains the cluster. Please note, this will delete all the resources in the group including the default storage account.
Remove-AzureRmResourceGroup -Name <Resource Group Name>
The cluster scaling feature allows you to change the number of worker nodes used by a cluster that is running in Azure HDInsight without having to re-create the cluster.
Only clusters with HDInsight version 3.1.3 or higher are supported. If you are unsure of the version of your cluster, you can check the Properties page. See List and show clusters.
The impact of changing the number of data nodes for each type of cluster supported by HDInsight:
You can seamlessly increase the number of worker nodes in a Hadoop cluster that is running without impacting any pending or running jobs. New jobs can also be submitted while the operation is in progress. Failures in a scaling operation are gracefully handled so that the cluster is always left in a functional state.
When a Hadoop cluster is scaled down by reducing the number of data nodes, some of the services in the cluster are restarted. This causes all running and pending jobs to fail at the completion of the scaling operation. You can, however, resubmit the jobs once the operation is complete.
You can seamlessly add or remove nodes to your HBase cluster while it is running. Regional Servers are automatically balanced within a few minutes of completing the scaling operation. However, you can also manually balance the regional servers by logging in to the headnode of cluster and running the following commands from a command prompt window:
>pushd %HBASE_HOME%\bin >hbase shell >balancer
You can seamlessly add or remove data nodes to your Storm cluster while it is running. But after a successful completion of the scaling operation, you will need to rebalance the topology.
Rebalancing can be accomplished in two ways:
- Storm web UI
Command-line interface (CLI) tool
Please refer to the Apache Storm documentation for more details.
The Storm web UI is available on the HDInsight cluster:
Here is an example how to use the CLI command to rebalance the Storm topology:
## Reconfigure the topology "mytopology" to use 5 worker processes,
## the spout "blue-spout" to use 3 executors, and ## the bolt "yellow-bolt" to use 10 executors $ storm rebalance mytopology -n 5 -e blue-spout=3 -e yellow-bolt=10
To change the Hadoop cluster size by using Azure PowerShell, run the following command from a client machine:
Set-AzureRmHDInsightClusterSize -ClusterName <Cluster Name> -TargetInstanceCount <NewSize>
HDInsight clusters have the following HTTP web services (all of these services have RESTful endpoints):
By default, these services are granted for access. You can revoke/grant the access. To revoke:
Revoke-AzureRmHDInsightHttpServicesAccess -ClusterName <Cluster Name>
$clusterName = "<HDInsight Cluster Name>" # Credential option 1 $hadoopUserName = "admin" $hadoopUserPassword = "<Enter the Password>" $hadoopUserPW = ConvertTo-SecureString -String $hadoopUserPassword -AsPlainText -Force $credential = New-Object System.Management.Automation.PSCredential($hadoopUserName,$hadoopUserPW) # Credential option 2 #$credential = Get-Credential -Message "Enter the HTTP username and password:" -UserName "admin" Grant-AzureRmHDInsightHttpServicesAccess -ClusterName $clusterName -HttpCredential $credential
By granting/revoking the access, you will reset the cluster user name and password.
This can also be done via the Portal. See Administer HDInsight by using the Azure portal.
Update HTTP user credentials
It is the same procedure as Grant/revoke HTTP access.If the cluster has been granted the HTTP access, you must first revoke it. And then grant the access with new HTTP user credentials.
Find the default storage account
The following Powershell script demonstrates how to get the default storage account name and the default storage account key for a cluster.
$clusterName = "<HDInsight Cluster Name>" $cluster = Get-AzureRmHDInsightCluster -ClusterName $clusterName $resourceGroupName = $cluster.ResourceGroup $defaultStorageAccountName = ($cluster.DefaultStorageAccount).Replace(".blob.core.windows.net", "") $defaultBlobContainerName = $cluster.DefaultStorageContainer $defaultStorageAccountKey = (Get-AzureRmStorageAccountKey -ResourceGroupName $resourceGroupName -Name $defaultStorageAccountName).Value $defaultStorageAccountContext = New-AzureStorageContext -StorageAccountName $defaultStorageAccountName -StorageAccountKey $defaultStorageAccountKey
Find the resource group
In the Resource Manager mode, each HDInsight cluster belongs to an Azure resource group. To find the resource group:
$clusterName = "<HDInsight Cluster Name>" $cluster = Get-AzureRmHDInsightCluster -ClusterName $clusterName $resourceGroupName = $cluster.ResourceGroup
To submit MapReduce jobs
To submit Hive jobs
To submit Pig jobs
To submit Sqoop jobs
To submit Oozie jobs