Create a cluster with Data Lake Storage Gen2 using the Azure portal

The Azure portal is a web-based management tool for services and resources hosted in the Microsoft Azure cloud. In this article, you learn how to create Linux-based Azure HDInsight clusters by using the portal. Additional details are available from Create HDInsight clusters.

Warning

Billing for HDInsight clusters is prorated per minute, whether you use them or not. Be sure to delete your cluster after you finish using it. See how to delete an HDInsight cluster.

If you don't have an Azure subscription, create a free account before you begin.

To create an HDInsight cluster that uses Data Lake Storage Gen2 for storage, follow these steps to configure a storage account that has a hierarchical namespace.

Create a user-assigned managed identity

Create a user-assigned managed identity, if you don’t already have one.

  1. Sign in to the Azure portal.
  2. In the upper-left click Create a resource.
  3. In the search box, type user assigned and click User Assigned Managed Identity.
  4. Click Create.
  5. Enter a name for your managed identity, select the correct subscription, resource group, and location.
  6. Click Create.

For more information on how managed identities work in Azure HDInsight, see Managed identities in Azure HDInsight.

Create a user-assigned managed identity

Create a storage account to use with Data Lake Storage Gen2

Create an storage account to use with Azure Data Lake Storage Gen2.

  1. Sign in to the Azure portal.
  2. In the upper-left click Create a resource.
  3. In the search box, type storage and click storage account.
  4. Click Create.
  5. On the Create storage account screen:
    1. Select the correct subscription and resource group.
    2. Enter a name for your storage account with Data Lake Storage Gen2.
    3. Click on the Advanced tab.
    4. Click Enabled next to Hierarchical namespace under Data Lake Storage Gen2.
    5. Click Review + create.
    6. Click Create

For more information on other options during storage account creation, see Quickstart: Create a storage account for Azure Data Lake Storage Gen2.

Screenshot showing storage account creation in the Azure portal

Set up permissions for the managed identity on the Data Lake Storage Gen2

Assign the managed identity to the Storage Blob Data Owner role on the storage account.

  1. In the Azure portal, go to your storage account.

  2. Select your storage account, then select Access control (IAM) to display the access control settings for the account. Select the Role assignments tab to see the list of role assignments.

    Screenshot showing storage access control settings

  3. Select the + Add role assignment button to add a new role.

  4. In the Add role assignment window, select the Storage Blob Data Owner role. Then, select the subscription that has the managed identity and storage account. Next, search to locate the user-assigned managed identity that you created previously. Finally, select the managed identity, and it will be listed under Selected members.

    Screenshot showing how to assign an Azure role

  5. Select Save. The user-assigned identity that you selected is now listed under the selected role.

  6. After this initial setup is complete, you can create a cluster through the portal. The cluster must be in the same Azure region as the storage account. In the Storage tab of the cluster creation menu, select the following options:

    • For Primary storage type, select Azure Data Lake Storage Gen2.

    • Under Primary Storage account, search for and select the newly created storage account with Data Lake Storage Gen2 storage.

    • Under Identity, select the newly created user-assigned managed identity.

      Storage settings for using Data Lake Storage Gen2 with Azure HDInsight

    Note

    • To add a secondary storage account with Data Lake Storage Gen2, at the storage account level, simply assign the managed identity created earlier to the new Data Lake Storage Gen2 that you want to add. Please be advised that adding a secondary storage account with Data Lake Storage Gen2 via the "Additional storage accounts" blade on HDInsight isn't supported.
    • You can enable RA-GRS or RA-ZRS on the Azure Blob storage account that HDInsight uses. However, creating a cluster against the RA-GRS or RA-ZRS secondary endpoint isn't supported.

Delete the cluster

See Delete an HDInsight cluster using your browser, PowerShell, or the Azure CLI.

Troubleshoot

If you run into issues with creating HDInsight clusters, see access control requirements.

Next steps

You've successfully created an HDInsight cluster. Now learn how to work with your cluster.

Apache Spark clusters

Apache Hadoop clusters

Apache HBase clusters