Create Linux-based clusters in HDInsight by using the Azure portal

The Azure portal is a web-based management tool for services and resources hosted in the Microsoft Azure cloud. In this article, you learn how to create Linux-based Azure HDInsight clusters by using the portal.

Prerequisites

Warning

Billing for HDInsight clusters is prorated per minute, whether you use them or not. Be sure to delete your cluster after you finish using it. See how to delete an HDInsight cluster.

Create clusters

The Azure portal exposes most of the cluster properties. By using Azure Resource Manager templates, you can hide many details. For more information, see Create Apache Hadoop clusters in HDInsight by using Resource Manager templates.

Note

The feature that requires secure transfer enforces all requests to your account through a secure connection. Only HDInsight cluster version 3.6 or newer supports this feature. For more information, see Create Apache Hadoop cluster with secure transfer storage accounts in Azure HDInsight.

  1. Sign in to the Azure portal.

  2. From the left menu, select + Create a resource.

  3. Under Azure Marketplace, select Analytics.

  4. Under Featured, select HDInsight.

    Create a new cluster in the Azure portal

  5. On the HDInsight page, select Custom (size, settings, apps).

  6. Select 1 Basics. Then enter the following information.

    Configure basic settings

    • Enter the Cluster Name. This name must be globally unique.

    • From the Subscription drop-down list, select the Azure subscription that's used for the cluster.

    • Select Cluster type. Then select the type of cluster you want to create. Examples are Hadoop and Apache Spark. The Operating system will be Linux. Next, select a cluster type version. Use the default version if you don't know what to choose. For more information, see HDInsight cluster versions.

      Important

      HDInsight clusters come in a variety of types. They correspond to the workload or technology that the cluster is tuned for. There's no supported method to create a cluster that combines multiple types. Examples are Storm and HBase on one cluster.

    • For Cluster login username and Cluster login password, provide the username and password for the admin user.

    • Enter an SSH Username. If you want the same SSH password as the admin password you specified earlier, select the Use same password as cluster login check box. If not, provide either a PASSWORD or PUBLIC KEY to authenticate the SSH user. A public key is the approach we recommend. Choose Select at the bottom to save the credentials configuration.

      For more information, see Connect to HDInsight (Apache Hadoop) by using SSH.

    • For Resource group, specify whether you want to create a new resource group or use an existing one.

    • Specify a datacenter location where the cluster is created.

    • Select Next to move to the next page.

  7. From 2 Security + networking, you can connect your cluster to a virtual network by using the provided drop-down menu. Select an Azure virtual network and the subnet if you want to place the cluster into a virtual network. For information on using HDInsight with a virtual network, see Plan a virtual network deployment for Azure HDInsight clusters. The article includes specific configuration requirements for the virtual network.

    If you want to use the Enterprise Security Package, follow these instructions: Configure a HDInsight cluster with Enterprise Security Package by using Azure Active Directory Domain Services.

    Select Next to move to the next page.

  8. From 3 Storage, specify whether you want Azure Storage or Azure Data Lake Storage as your default storage. For more information, see the following table.

    Set storage settings

    Storage Description
    Azure Storage blobs as the default storage
    • For Primary Storage type, select Azure Storage. For Selection method, choose My subscriptions if you want to specify a storage account that's part of your Azure subscription. Then select the storage account. Otherwise, select Access key. Then provide the information for the storage account that you want to choose from outside your Azure subscription.
    • For Default container, choose the default container name suggested by the portal or specify your own.
    • If Azure Blob storage is your default storage, you can also select Additional Storage Accounts to specify additional storage accounts to associate with the cluster. For Azure Storage Keys, select Add a storage key. Then you can provide a storage account from your Azure subscriptions or from other subscriptions. Provide the storage account access key.
    • If Blob storage is your default storage, you can also select Data Lake Storage access to specify Azure Data Lake Storage as additional storage. For more information, see Quickstart: Set up clusters in HDInsight.
    Azure Data Lake Storage as the default storage For Primary storage type, select Azure Data Lake Storage Gen1 or Azure Data Lake Storage Gen2. Then refer to the article Quickstart: Set up clusters in HDInsight for instructions.
    External metastores As an option, specify a SQL database to save Apache Hive and Apache Oozie metadata associated with the cluster. For Select a SQL database for Hive, select a SQL database. Then provide the username and password for the database. Repeat these steps for Oozie metadata.

    Some considerations about using Azure SQL database for metastores are as follows:
    • The Azure SQL database that's used for the metastore must allow connectivity to other Azure services, including Azure HDInsight. On the right side of the Azure SQL database dashboard, select the server name. This server is the one that the SQL database instance runs on. After you're in server view, select Configure. Then for Azure Services, select Yes. Then select Save.
    • When you create a metastore, don't name a database with dashes or hyphens. These characters can cause the cluster creation process to fail.

    Warning

    Using an additional storage account in a different location than the HDInsight cluster isn't supported.

    Select Next to move to the next page.

  9. From 4 Applications (optional), select any applications that you want. Microsoft, independent software vendors (ISVs), or you can develop these applications. For more information, see Install applications during cluster creation.

    Select Next to move to the next page.

  10. 5 Cluster size displays information about the nodes that are used for this cluster. Set the number of worker nodes that you need for the cluster. The estimated cost of running the cluster is also shown.

    Specify node pricing tiers

    Important

    If you plan on more than 32 worker nodes, select a head node size with at least eight cores and 14 GB RAM. Plan the nodes either at cluster creation or by scaling the cluster after creation.

    For more information on node sizes and associated costs, see Azure HDInsight pricing.

    Select Next to move to the next page.

  11. From 6 Script actions, you can customize a cluster to install custom components. This option works if you want to use a custom script to customize a cluster, as the cluster is being created. For more information about script actions, see Customize Linux-based HDInsight clusters by using script actions.

    Select Next to move to the next page.

  12. From 7 Summary, verify the information you entered earlier. Then select Create.

    Confirm configurations

    Note

    It takes some time for the cluster to be created, usually around 20 minutes. Monitor Notifications to check on the provisioning process.

  13. After the creation process finishes, select Go to Resource from the Deployment succeeded notification. The cluster window provides the following information.

    Cluster interface

    The icons in the window are explained as follows:

    • The Overview tab provides all the essential information about the cluster. Examples are the name, the resource group it belongs to, the location, the operating system, and the URL for the cluster dashboard.
    • Dashboard directs you to the Ambari portal associated with the cluster.
    • Secure Shell provides information needed to access the cluster by using SSH.
    • By using Scale cluster, you can increase the number of worker nodes associated with the cluster.
    • Delete deletes the HDInsight cluster.

Customize clusters

Delete the cluster

Warning

Billing for HDInsight clusters is prorated per minute, whether you use them or not. Be sure to delete your cluster after you finish using it. See how to delete an HDInsight cluster.

Troubleshoot

If you run into issues with creating HDInsight clusters, see access control requirements.

Next steps

You've successfully created an HDInsight cluster. Now learn how to work with your cluster.

Apache Hadoop clusters

Apache HBase clusters

Apache Storm clusters

Apache Spark clusters