Use empty edge nodes on Apache Hadoop clusters in HDInsight

Learn how to add an empty edge node to an HDInsight cluster. An empty edge node is a Linux virtual machine with the same client tools installed and configured as in the headnodes. But with no Apache Hadoop services running. You can use the edge node for accessing the cluster, testing your client applications, and hosting your client applications.

You can add an empty edge node to an existing HDInsight cluster, to a new cluster when you create the cluster. Adding an empty edge node is done using Azure Resource Manager template. The following sample demonstrates how it's done using a template:

"resources": [
    {
        "name": "[concat(parameters('clusterName'),'/', variables('applicationName'))]",
        "type": "Microsoft.HDInsight/clusters/applications",
        "apiVersion": "2015-03-01-preview",
        "dependsOn": [ "[concat('Microsoft.HDInsight/clusters/',parameters('clusterName'))]" ],
        "properties": {
            "marketPlaceIdentifier": "EmptyNode",
            "computeProfile": {
                "roles": [{
                    "name": "edgenode",
                    "targetInstanceCount": 1,
                    "hardwareProfile": {
                        "vmSize": "{}"
                    }
                }]
            },
            "installScriptActions": [{
                "name": "[concat('emptynode','-' ,uniquestring(variables('applicationName')))]",
                "uri": "[parameters('installScriptAction')]",
                "roles": ["edgenode"]
            }],
            "uninstallScriptActions": [],
            "httpsEndpoints": [],
            "applicationType": "CustomApplication"
        }
    }
],

As shown in the sample, you can optionally call a script action to do additional configuration. Such as installing Apache Hue in the edge node. The script action script must be publicly accessible on the web. For example, if the script is stored in Azure Storage, use either public containers or public blobs.

The edge node virtual machine size must meet the HDInsight cluster worker node vm size requirements. For the recommended worker node vm sizes, see Create Apache Hadoop clusters in HDInsight.

After you've created an edge node, you can connect to the edge node using SSH, and run client tools to access the Hadoop cluster in HDInsight.

Warning

Custom components that are installed on the edge node receive commercially reasonable support from Microsoft. This might result in resolving problems you encounter. Or, you may be referred to community resources for further assistance. The following are some of the most active sites for getting help from the community:

If you are using an Apache technology, you may be able to find assistance through the Apache project sites on https://apache.org, such as the Apache Hadoop site.

Important

Ubuntu images become available for new HDInsight cluster creation within 3 months of being published. As of January 2019, running clusters (including edge nodes) are not auto-patched. Customers must use script actions or other mechanisms to patch a running cluster. For more information, see OS patching for HDInsight.

Add an edge node to an existing cluster

In this section, you use a Resource Manager template to add an edge node to an existing HDInsight cluster. The Resource Manager template can be found in GitHub. The Resource Manager template calls a script action located at https://raw.githubusercontent.com/Azure/azure-quickstart-templates/master/quickstarts/microsoft.hdinsight/hdinsight-linux-add-edge-node/scripts/EmptyNodeSetup.sh. The script doesn't do any actions. It's to demonstrate calling script action from a Resource Manager template.

  1. Select the following image to sign in to Azure and open the Azure Resource Manager template in the Azure portal.

    Deploy to Azure button for new cluster

  2. Configure the following properties:

    Property Description
    Subscription Select an Azure subscription used for creating the cluster.
    Resource group Select the resource group used for the existing HDInsight cluster.
    Location Select the location of the existing HDInsight cluster.
    Cluster Name Enter the name of an existing HDInsight cluster.
  3. Check I agree to the terms and conditions stated above, and then select Purchase to create the edge node.

Important

Make sure to select the Azure resource group for the existing HDInsight cluster. Otherwise, you get the error message "Can not perform requested operation on nested resource. Parent resource '<ClusterName>' not found."

Add an edge node when creating a cluster

In this section, you use a Resource Manager template to create HDInsight cluster with an edge node. The Resource Manager template can be found in the Azure quickstart templates gallery. The Resource Manager template calls a script action located at https://raw.githubusercontent.com/Azure/azure-quickstart-templates/master/quickstarts/microsoft.hdinsight/hdinsight-linux-with-edge-node/scripts/EmptyNodeSetup.sh. The script doesn't do any actions. It's to demonstrate calling script action from a Resource Manager template.

  1. Create an HDInsight cluster if you don't have one yet. See Get started using Hadoop in HDInsight.

  2. Select the following image to sign in to Azure and open the Azure Resource Manager template in the Azure portal.

    Deploy to Azure button for new cluster

  3. Configure the following properties:

    Property Description
    Subscription Select an Azure subscription used for creating the cluster.
    Resource group Create a new resource group used for the cluster.
    Location Select a location for the resource group.
    Cluster Name Enter a name for the new cluster to create.
    Cluster Login User Name Enter the Hadoop HTTP user name. The default name is admin.
    Cluster Login Password Enter the Hadoop HTTP user password.
    Ssh User Name Enter the SSH user name. The default name is sshuser.
    Ssh Password Enter the SSH user password.
    Install Script Action Keep the default value for going through this article.

    Some properties have been hardcoded in the template: Cluster type, Cluster worker node count, Edge node size, and Edge node name.

  4. Check I agree to the terms and conditions stated above, and then select Purchase to create the cluster with the edge node.

Add multiple edge nodes

You can add multiple edge nodes to an HDInsight cluster. The multiple edge nodes configuration can only be done using Azure Resource Manager Templates. See the template sample at the beginning of this article. Update the targetInstanceCount to reflect the number of edge nodes you would like to create.

Access an edge node

The edge node ssh endpoint is <EdgeNodeName>.<ClusterName>-ssh.azurehdinsight.net:22. For example, new-edgenode.myedgenode0914-ssh.azurehdinsight.net:22.

The edge node appears as an application on the Azure portal. The portal gives you the information to access the edge node using SSH.

To verify the edge node SSH endpoint

  1. Sign on to the Azure portal.
  2. Open the HDInsight cluster with an edge node.
  3. Select Applications. You shall see the edge node. The default name is new-edgenode.
  4. Select the edge node. You shall see the SSH endpoint.

To use Hive on the edge node

  1. Use SSH to connect to the edge node. For information, see Use SSH with HDInsight.

  2. After you've connected to the edge node using SSH, use the following command to open the Hive console:

    hive
    
  3. Run the following command to show Hive tables in the cluster:

    show tables;
    

Delete an edge node

You can delete an edge node from the Azure portal.

  1. Sign on to the Azure portal.
  2. Open the HDInsight cluster with an edge node.
  3. Select Applications. You shall see a list of edge nodes.
  4. Right-click the edge node you want to delete, and then select Delete.
  5. Select Yes to confirm.

Next steps

In this article, you've learned how to add an edge node and how to access the edge node. To learn more, see the following articles: