Manage Apache Hadoop clusters in HDInsight by using the Azure portal

Using the Azure portal, you can manage Apache Hadoop clusters in Azure HDInsight. Use the tab selector above for information on managing Hadoop clusters in HDInsight using other tools.

Prerequisites

Getting Started

Sign in to https://portal.azure.com.

List and show clusters

The HDInsight clusters page will list your existing clusters. From the portal:

  1. Select All services from the left menu.
  2. Select HDInsight clusters under ANALYTICS.

Cluster home page

Select your cluster name from the HDInsight clusters page. This will open the Overview view, which looks similar to the following image:

Azure portal HDInsight cluster essentials

Top menu:

  • Move: Moves the cluster to another resource group or to another subscription.
  • Delete: Deletes the cluster.
  • Refresh: Refreshes the view.

Left menu:

  • Top-left menu

    • Overview: Provides general information for your cluster.
    • Activity log: Show and query activity logs.
    • Access control (IAM): Use role assignments. See Use role assignments to manage access to your Azure subscription resources.
    • Tags: Allows you to set key/value pairs to define a custom taxonomy of your cloud services. For example, you may create a key named project, and then use a common value for all services associated with a specific project.
    • Diagnose and solve problems: Display troubleshooting information.
    • Quick Start: Displays information that helps you get started using HDInsight.
    • Tools: Help information for HDInsight related tools.
  • Settings menu

  • Monitoring menu

    • Alerts: Manage the alerts and actions.
    • Metrics: Monitor the cluster metrics in Azure Monitor logs.
    • Diagnosis settings: Settings on where to store the diagnosis metrics.
    • Operations Management Suite: Monitor your cluster in Azure Operations Management Suite (OMS) and Azure Monitor logs.
  • Support + troubleshooting menu

Cluster Properties

From the cluster home page, under Settings select Properties.

  • Hostname: Cluster name.
  • Cluster URL: The URL for the Ambari web interface.
  • Secure shell (SSH): The username and host name to use in accessing the cluster via SSH.
  • Status: One of: Aborted, Accepted, ClusterStorageProvisioned, AzureVMConfiguration, HDInsightConfiguration, Operational, Running, Error, Deleting, Deleted, Timedout, DeleteQueued, DeleteTimedout, DeleteError, PatchQueued, CertRolloverQueued, ResizeQueued, or ClusterCustomization.
  • Region: Azure location. For a list of supported Azure locations, see the Region drop-down list box on HDInsight pricing.
  • Date created: The date the cluster was deployed.
  • Operating system: Either Windows or Linux.
  • Type: Hadoop, HBase, Storm, Spark.
  • Version. See HDInsight versions.
  • Subscription: Subscription name.
  • Default data source: The default cluster file system.
  • Worker nodes size: The selected VM size of the worker nodes.
  • Head node size: The selected VM size of the head nodes.
  • Virtual network: The name of the Virtual Network which the cluster is deployed, if one was selected at deployment time.

Move clusters

You can move an HDInsight cluster to another Azure resource group or another subscription.

From the cluster home page:

  1. Select Move from the top menu.
  2. Select Move to another resource group or Move to another subscription.
  3. Follow the instructions from the new page.

Delete clusters

Deleting a cluster does not delete the default storage account nor any linked storage accounts. You can re-create the cluster by using the same storage accounts and the same metastores. We recommend using a new default Blob container when you re-create the cluster.

From the cluster home page:

  1. Select Delete from the top menu.
  2. Follow the instructions from the new page.

See also Pause/shut down clusters.

Add additional storage accounts

You can add additional Azure Storage accounts and Azure Data Lake Storage accounts after a cluster is created. For more information, see Add additional storage accounts to HDInsight.

Scale clusters

The cluster scaling feature allows you to change the number of worker nodes used by an Azure HDInsight cluster, without having to re-create the cluster.

Note

Only clusters with HDInsight version 3.1.3 or higher are supported. If you are unsure of the version of your cluster, you can check the Properties page. See List and show clusters.

From the cluster home page:

  1. Under Settings, select Cluster size.

  2. Enter Number of Worker nodes in the numeric text box. The limit on the number of cluster nodes varies between Azure subscriptions. You can contact billing support to increase the limit. The cost information reflects the changes you have made to the number of nodes.

  3. Select Save.

    HDInsight hadoop hbase storm spark scale

The impact of changing the number of data nodes varies for each type of cluster supported by HDInsight:

  • Apache Hadoop

    You can seamlessly increase the number of worker nodes in a Hadoop cluster that is running without impacting any pending or running jobs. New jobs can also be submitted while the operation is in progress. Failures in a scaling operation are gracefully handled so that the cluster is always left in a functional state.

    When a Hadoop cluster is scaled down by reducing the number of data nodes, some of the services in the cluster are restarted. This behavior causes all running and pending jobs to fail at the completion of the scaling operation. You can, however, resubmit the jobs once the operation is complete.

  • Apache HBase

    You can seamlessly add or remove nodes to your HBase cluster while it is running. Regional Servers are automatically balanced within a few minutes of completing the scaling operation. However, you can also manually balance the regional servers by logging in to the headnode of cluster and running the following commands from a command prompt window:

    pushd %HBASE_HOME%\bin
    hbase shell
    balancer
    

    For more information on using the HBase shell, see Get started with an Apache HBase example in HDInsight.

  • Apache Storm

    You can seamlessly add or remove data nodes to your Storm cluster while it is running. However, after a successful completion of the scaling operation, you will need to rebalance the topology.

    Rebalancing can be accomplished in two ways:

    • Storm web UI

    • Command-line interface (CLI) tool

      Refer to the Apache Storm documentation for more details.

      The Storm web UI is available on the HDInsight cluster:

      HDInsight Storm scale rebalance

      Here is an example CLI command to rebalance the Storm topology:

      ## Reconfigure the topology "mytopology" to use 5 worker processes,
      ## the spout "blue-spout" to use 3 executors, and
      ## the bolt "yellow-bolt" to use 10 executors
      $ storm rebalance mytopology -n 5 -e blue-spout=3 -e yellow-bolt=10
      

Pause/shut down clusters

Most of Hadoop jobs are batch jobs that are only run occasionally. For most Hadoop clusters, there are large periods of time that the cluster is not being used for processing. With HDInsight, your data is stored in Azure Storage, so you can safely delete a cluster when it is not in use. You are also charged for an HDInsight cluster, even when it is not in use. Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they are not in use.

There are many ways you can program the process:

For the pricing information, see HDInsight pricing. To delete a cluster from the Portal, see Delete clusters

Upgrade clusters

See Upgrade HDInsight cluster to a newer version.

Open the Apache Ambari web UI

Ambari provides an intuitive, easy-to-use Hadoop management web UI backed by its RESTful APIs. Ambari enables system administrators to manage and monitor Hadoop clusters.

From the cluster home page:

  1. Select Cluster dashboards.

    HDInsight Hadoop cluster menu

  2. Select Ambari home from the new page.

  3. Enter the cluster username and password. The default cluster username is admin. The Ambari web UI looks like:

For more information, see Manage HDInsight clusters by using the Apache Ambari Web UI.

Change passwords

An HDInsight cluster can have two user accounts. The HDInsight cluster user account (A.K.A. HTTP user account) and the SSH user account are created during the creation process. You can use the portal to change the cluster user account password, and script actions to change the SSH user account.

Change the cluster user password

Note

Changing the cluster user (admin) password may cause script actions run against this cluster to fail. If you have any persisted script actions that target worker nodes, these scripts may fail when you add nodes to the cluster through resize operations. For more information on script actions, see Customize HDInsight clusters using script actions.

From the cluster home page:

  1. Select SSH + Cluster login under Settings.
  2. Select Reset credential.
  3. Enter and confirm new password in the text boxes.
  4. Select OK.

The password is changed on all nodes in the cluster.

Change the SSH user password

  1. Using a text editor, save the following text as a file named changepassword.sh.

    Important

    You must use an editor that uses LF as the line ending. If the editor uses CRLF, then the script does not work.

    #! /bin/bash
    USER=$1
    PASS=$2
    usermod --password $(echo $PASS | openssl passwd -1 -stdin) $USER
    
  2. Upload the file to a storage location that can be accessed from HDInsight using an HTTP or HTTPS address. For example, a public file store such as OneDrive or Azure Blob storage. Save the URI (HTTP or HTTPS address) to the file, as this URI is needed in the next step.

  3. From the cluster home page select Script actions under Settings.

  4. From the Script Actions blade, select Submit New.

  5. From the Submit script action blade, enter the following information:

    Field Value
    Script type Select - Custom from the drop-down list.
    Name "Change ssh password"
    Bash script URI The URI to the changepassword.sh file
    Node type(s): (Head, Worker, Nimbus, Supervisor, Zookeeper, etc.) ✓ for all node types listed
    Parameters Enter the SSH user name and then the new password. There should be one space between the user name and the password.
    Persist this script action ... Leave this field unchecked.
  6. Select Create to apply the script. Once the script finishes, you are able to connect to the cluster using SSH with the new password.

Grant/revoke access

HDInsight clusters have the following HTTP web services (all of these services have RESTful endpoints):

  • ODBC
  • JDBC
  • Ambari
  • Oozie
  • Templeton

By default, these services are granted for access. You can revoke/grant the access using Azure Classic CLI and Azure PowerShell.

Find the subscription ID

Each cluster is tied to an Azure subscription. The Azure subscription ID is visible from the cluster home page.

Find the resource group

In the Azure Resource Manager mode, each HDInsight cluster is created with an Azure Resource Manager group. The Resource Manager group is visible from the cluster home page.

Find the storage accounts

HDInsight clusters use either an Azure Storage account or Azure Data Lake Storage to store data. Each HDInsight cluster can have one default storage account and a number of linked storage accounts. To list the storage accounts, from the cluster home page under Settings, select Storage accounts.

Monitor jobs

See Manage HDInsight clusters by using the Apache Ambari Web UI.

Monitor cluster usage

The Usage section of the HDInsight cluster blade displays information about the number of cores available to your subscription for use with HDInsight, as well as the number of cores allocated to this cluster and how they are allocated for the nodes within this cluster. See List and show clusters.

Important

To monitor the services provided by the HDInsight cluster, you must use Ambari Web or the Ambari REST API. For more information on using Ambari, see Manage HDInsight clusters using Apache Ambari

Connect to a cluster

Next steps

In this article, you have learned some basic administrative functions. To learn more, see the following articles: