Create Linux-based clusters in HDInsight using the Azure portal

The Azure portal is a web-based management tool for services and resources hosted in the Microsoft Azure cloud. In this article you will learn how to create Linux-based HDInsight clusters using the portal.

Prerequisites

Warning

Billing for HDInsight clusters is prorated per minute, whether you are using them or not. Be sure to delete your cluster after you have finished using it. For more information, see How to delete an HDInsight cluster.

  • An Azure subscription. See Get Azure free trial.
  • A modern web browser. The Azure portal uses HTML5 and Javascript, and may not function correctly in older web browsers.

Create clusters

The Azure portal exposes most of the cluster properties. Using Azure Resource Manager template, you can hide a lot of details. For more information, see Create Linux-based Hadoop clusters in HDInsight using Azure Resource Manager templates.

  1. Sign in to the Azure portal.
  2. Click +, click Intelligence + Analytics, and then click HDInsight.

    Creating a new cluster in the Azure portal

  3. In the HDInsight blade, click Custom (size, settings, apps), click Basics, and then enter the following information.

    Creating a new cluster in the Azure portal

    • Enter Cluster Name: This name must be globally unique.

    • From the Subscription drop-down, select the Azure subscription that will be used for the cluster.

    • Click Cluster type, and then select:

      • Cluster Type: If you don't know what to choose, select Hadoop. It is the most popular cluster type.

        Important

        HDInsight clusters come in a variety of types, which correspond to the workload or technology that the cluster is tuned for. There is no supported method to create a cluster that combines multiple types, such as Storm and HBase on one cluster.

      • Operating System: Select Linux.

      • Version: Use the default version if you don't know what to choose. For more information, see HDInsight cluster versions.

      • Cluster Tier: Azure HDInsight provides the big data cloud offerings in two categories: Standard tier and Premium tier. For more information, see Cluster tiers.
    • For Cluster login username and Cluster login password, provide the username and password for the admin user.

    • Enter an SSH Username and if you want to have the SSH password same as the admin password you specified earlier, select the Use same password as cluster login check box. If not, provide either a PASSWORD or PUBLIC KEY, which will be used to authenticate the SSH user. Using a public key is the recommended approach. Click Select at the bottom to save the credentials configuration.

      For information, see Use SSH with HDInsight.

    • For Resource group, specify whether you want to create a new resource group or use an existing one.

    • Specify a data center location where the cluster will be created.

    • Click Next.

  4. On the Storage blade, specify whether you want Azure Storage (WASB) or Data Lake Store as your default storage. Look at the table below for more information.

    Creating a new cluster in the Azure portal

    Storage Description
    Azure Storage Blobs as default storage
    • For Primary Storage type, select Azure Storage. After that, for Selection method, you can choose My subscriptions if you want to specify a storage account that is part of your Azure subscription and then select the storage account. Otherwise, click Access key and provide the information for the storage account that you want to choose from outside your Azure subscription.
    • For Default container, you can choose to go with the default container name suggested by the portal or specify your own.
    • If you are using WASB as default storage, you can (optionally) click Additional Storage Accounts to specify additional storage accounts to associate with the cluster. In the Azure Storage Keys blade, click Add a storage key, and then you can provide a storage account from your Azure subscriptions or from other subscriptions (by providing the storage account access key).
    • If you are using WASB as default storage, you can (optionally) click Data Lake Store access to specify Azure Data Lake Store as additional storage. For more information, see Create an HDInsight cluster with Data Lake Store using Azure Portal.
    Azure Data Lake Store as default storage For Primary storage type, select Data Lake Store and then refer to the article Create an HDInsight cluster with Data Lake Store using Azure Portal for instructions.
    External metastores Optionally, you can specify a SQL database to save Hive and Oozie metadata associated with the cluster. For Select a SQL database for Hive select a SQL database, and then provide the username/password for the database. Repeat these steps for Oozie metadata.

    Some considerations while using Azure SQL database for metastores.
    • The Azure SQL database used for the metastore must allow connectivity to other Azure services, including Azure HDInsight. On the Azure SQL database dashboard, on the right side, click the server name. This is the server on which the SQL database instance is running. Once you are on the server view, click Configure, and then for Azure Services, click Yes, and then click Save.
    • When creating a metastore, do not use a database name that contains dashes or hyphens, as this can cause the cluster creation process to fail.

    Click Next.

    Warning

    Using an additional storage account in a different location than the HDInsight cluster is not supported.

  5. Optionally, click Applications to install applications that work with HDInsight clusters. These applications can be developed by Microsoft, independent software vendors (ISV) or by yourself. For more information, see Install HDInsight applications.

  6. Click Cluster size to display information about the nodes that will be created for this cluster. Set the number of worker nodes that you need for the cluster. The estimated cost of the cluster will be shown within the blade.

    Node pricing tiers blade

    Important

    If you plan on more than 32 worker nodes, either at cluster creation or by scaling the cluster after creation, then you must select a head node size with at least 8 cores and 14GB ram.

    For more information on node sizes and associated costs, see HDInsight pricing.

    Click Next to save the node pricing configuration.

  7. Click Advanced settings to configure other optional settings such as using Script Actions to customize a cluster to install custom components or joining a Virtual Network. Look at the table below for more information.

    Node pricing tiers blade

    Option Description
    Script Actions Use this option if you want to use a custom script to customize a cluster, as the cluster is being created. For more information about script actions, see Customize HDInsight clusters using Script Action.
    Virtual Network Select an Azure virtual network and the subnet if you want to place the cluster into a virtual network. For information on using HDInsight with a Virtual Network, including specific configuration requirements for the Virtual Network, see Extend HDInsight capabilities by using an Azure Virtual Network.

    Click Next.

  8. On the Summary blade, verify the information you entered earlier and then click Create.

    Node pricing tiers blade

    Note

    It will take some time for the cluster to be created, usually around 15 minutes. Use the tile on the Startboard, or the Notifications entry on the left of the page to check on the provisioning process.

  9. Once the creation process completes, click the tile for the cluster from the Startboard to launch the cluster blade. The cluster blade provides the following information.

    Cluster blade

    Use the following to understand the icons at the top of this blade.

    • Overview tab provides all the essential information about the cluster such as the name, the resource group it belongs to, the location, the operating system, URL for the cluster dashboard, etc.
    • Dashboard directs you to the Ambari portal associated with the cluster.
    • Secure Shell: Information needed to access the cluster using SSH.
    • Scale cluster lets you increase the number of worker nodes associated with the cluster.
    • Delete: Deletes the HDInsight cluster.

Customize clusters

Delete the cluster

Warning

Billing for HDInsight clusters is prorated per minute, whether you are using them or not. Be sure to delete your cluster after you have finished using it. For more information, see How to delete an HDInsight cluster.

Troubleshoot

If you run into issues with creating HDInsight clusters, see access control requirements.

Next steps

Now that you have successfully created an HDInsight cluster, use the following to learn how to work with your cluster:

Hadoop clusters

HBase clusters

Storm clusters

Spark clusters