Create Linux-based clusters in HDInsight using Azure PowerShell

Azure PowerShell is a powerful scripting environment that you can use to control and automate the deployment and management of your workloads in Microsoft Azure. This document provides information about how to create a Linux-based HDInsight cluster by using Azure PowerShell. It also includes an example script.

Note

Azure PowerShell is only available on Windows clients. If you are using a Linux, Unix, or Mac OS X client, see Create a Linux-based HDInsight cluster using Azure CLI for information about using the Azure CLI to create a cluster.

Prerequisites

You must have the following before starting this procedure:

Create cluster

Warning

Billing for HDInsight clusters is prorated per minute, whether you are using them or not. Be sure to delete your cluster after you have finished using it. For more information, see How to delete an HDInsight cluster.

To create an HDInsight cluster by using Azure PowerShell, you must complete the following procedures:

  • Create an Azure resource group
  • Create an Azure Storage account
  • Create an Azure Blob container
  • Create an HDInsight cluster

The following script demonstrates how to create a new cluster:

# Login to your Azure subscription
# Is there an active Azure subscription?
$sub = Get-AzureRmSubscription -ErrorAction SilentlyContinue
if(-not($sub))
{
    Add-AzureRmAccount
}

# If you have multiple subscriptions, set the one to use
# $subscriptionID = "<subscription ID to use>"
# Select-AzureRmSubscription -SubscriptionId $subscriptionID

# Get user input/default values
$resourceGroupName = Read-Host -Prompt "Enter the resource group name"
$location = Read-Host -Prompt "Enter the Azure region to create resources in"

# Create the resource group
New-AzureRmResourceGroup -Name $resourceGroupName -Location $location

$defaultStorageAccountName = Read-Host -Prompt "Enter the name of the storage account"

# Create an Azure storae account and container
New-AzureRmStorageAccount `
    -ResourceGroupName $resourceGroupName `
    -Name $defaultStorageAccountName `
    -Type Standard_LRS `
    -Location $location
$defaultStorageAccountKey = (Get-AzureRmStorageAccountKey `
                                -ResourceGroupName $resourceGroupName `
                                -Name $defaultStorageAccountName)[0].Value
$defaultStorageContext = New-AzureStorageContext `
                                -StorageAccountName $defaultStorageAccountName `
                                -StorageAccountKey $defaultStorageAccountKey

# Get information for the HDInsight cluster
$clusterName = Read-Host -Prompt "Enter the name of the HDInsight cluster"
# Cluster login is used to secure HTTPS services hosted on the cluster
$httpCredential = Get-Credential -Message "Enter Cluster login credentials" -UserName "admin"
# SSH user is used to remotely connect to the cluster using SSH clients
$sshCredentials = Get-Credential -Message "Enter SSH user credentials"

# Default cluster size (# of worker nodes), version, type, and OS
$clusterSizeInNodes = "4"
$clusterVersion = "3.5"
$clusterType = "Hadoop"
$clusterOS = "Linux"
# Set the storage container name to the cluster name
$defaultBlobContainerName = $clusterName

# Create a blob container. This holds the default data store for the cluster.
New-AzureStorageContainer `
    -Name $clusterName -Context $defaultStorageContext 

# Create the HDInsight cluster
New-AzureRmHDInsightCluster `
    -ResourceGroupName $resourceGroupName `
    -ClusterName $clusterName `
    -Location $location `
    -ClusterSizeInNodes $clusterSizeInNodes `
    -ClusterType $clusterType `
    -OSType $clusterOS `
    -Version $clusterVersion `
    -HttpCredential $httpCredential `
    -DefaultStorageAccountName "$defaultStorageAccountName.blob.core.windows.net" `
    -DefaultStorageAccountKey $defaultStorageAccountKey `
    -DefaultStorageContainer $clusterName `
    -SshCredential $sshCredentials

The values you specify for the cluster login are used to create the Hadoop user account for the cluster. Use this account to connect to services hosted on the cluster such as web UIs or REST APIs.

The values you specify for the SSH user are used to create the SSH user for the cluster. Use this account to start a remote SSH session on the cluster and run jobs. For more information, see the Use SSH with HDInsight document.

Important

If you plan to use more than 32 worker nodes (either at cluster creation or by scaling the cluster after creation), you must also specify a head node size with at least 8 cores and 14 GB of RAM.

For more information on node sizes and associated costs, see HDInsight pricing.

It can take up to 20 minutes to create a cluster.

Create cluster: Configuration object

You can also create an HDInsight configuration object using New-AzureRmHDInsightClusterConfig cmdlet. You can then modify this configuration object to enable additional configuration options for your cluster. Finally, use the -Config parameter of the New-AzureRmHDInsightCluster cmdlet to use the configuration.

The following script creates a configuration object to configure an R Server on HDInsight cluster type. The configuration enables an edge node, RStudio, and an additional storage account.

$additionalStorageAccountName = Read-Host -Prompt "Enter the name of the additional storage account"

# Create the additional storage account
New-AzureRmStorageAccount -ResourceGroupName $resourceGroupName `
    -StorageAccountName $additionalStorageAccountName `
    -Location $location `
    -Type Standard_LRS

# Get the additional storage account key
$additionalStorageAccountKey = (Get-AzureRmStorageAccountKey -Name $additionalStorageAccountName -ResourceGroupName $resourceGroupName)[0].Value

# Create a new configuration for RServer cluster type
# Use -EdgeNodeSize to set the size of the edge node for RServer clusters
# if you want a specific size. Otherwise, the default size is used.
$config = New-AzureRmHDInsightClusterConfig `
    -ClusterType "RServer" `
    -EdgeNodeSize "Standard_D12_v2"

# Add RStudio to the configuration
$rserverConfig = @{"RStudio"="true"}
$config = $config | Add-AzureRmHDInsightConfigValues `
    -RServer $rserverConfig

# Add an additional storage account
Add-AzureRmHDInsightStorage -Config $config -StorageAccountName "$additionalStorageAccountName.blob.core.windows.net" -StorageAccountKey $additionalStorageAccountKey

# Create a new HDInsight cluster using -Config
New-AzureRmHDInsightCluster `
    -ClusterName $clusterName `
    -ResourceGroupName $resourceGroupName `
    -HttpCredential $httpCredential `
    -Location $location `
    -DefaultStorageAccountName "$defaultStorageAccountName.blob.core.windows.net" `
    -DefaultStorageAccountKey $defaultStorageAccountKey `
    -DefaultStorageContainer $defaultStorageContainerName  `
    -ClusterSizeInNodes $clusterSizeInNodes `
    -OSType $clusterOS `
    -Version $clusterVersion `
    -SshCredential $sshCredentials `
    -Config $config

Warning

Using a storage account in a different location than the HDInsight cluster is not supported. When using this example, create the additional storage account in the same location as the server.

Customize clusters

Delete the cluster

Warning

Billing for HDInsight clusters is prorated per minute, whether you are using them or not. Be sure to delete your cluster after you have finished using it. For more information, see How to delete an HDInsight cluster.

Troubleshoot

If you run into issues with creating HDInsight clusters, see access control requirements.

Next steps

Now that you have successfully created an HDInsight cluster, use the following resources to learn how to work with your cluster.

Hadoop clusters

HBase clusters

Storm clusters

Spark clusters