Create HBase clusters on HDInsight in Azure Virtual Network

Learn how to create Azure HDInsight HBase clusters in an Azure Virtual Network.

With virtual network integration, HBase clusters can be deployed to the same virtual network as your applications so that applications can communicate with HBase directly. The benefits include:

  • Direct connectivity of the web application to the nodes of the HBase cluster, which enables communication via HBase Java remote procedure call (RPC) APIs.
  • Improved performance by not having your traffic go over multiple gateways and load-balancers.
  • The ability to process sensitive information in a more secure manner without exposing a public endpoint.

Prerequisites

Before you begin this tutorial, you must have the following items:

Create HBase cluster into virtual network

In this section, you create a Linux-based HBase cluster with the dependent Azure Storage account in an Azure virtual network using an Azure Resource Manager template. For other cluster creation methods and understanding the settings, see Create HDInsight clusters. For more information about using a template to create Hadoop clusters in HDInsight, see Create Hadoop clusters in HDInsight using Azure Resource Manager templates

Note

Some properties are hard-coded into the template. For example:

  • Location: East US 2
  • Cluster version: 3.6
  • Cluster worker node count: 2
  • Default storage account: a unique string
  • Virtual network name: <Cluster Name>-vnet
  • Virtual network address space: 10.0.0.0/16
  • Subnet name: subnet1
  • Subnet address range: 10.0.0.0/24

<Cluster Name> is replaced with the cluster name you provide when using the template.

  1. Click the following image to open the template in the Azure portal. The template is located in Azure QuickStart Templates.

    Deploy to Azure

  2. From the Custom deployment blade, enter the following properties:

    • Subscription: Select an Azure subscription used to create the HDInsight cluster, the dependent Storage account and the Azure virtual network.
    • Resource group: Select Create new, and specify a new resource group name.
    • Location: Select a location for the resource group.
    • ClusterName: Enter a name for the Hadoop cluster to be created.
    • Cluster login name and password: The default login name is admin.
    • SSH username and password: The default username is sshuser. You can rename it.
    • I agree to the terms and the conditions stated above: (Select)
  3. Click Purchase. It takes about around 20 minutes to create a cluster. Once the cluster is created, you can click the cluster blade in the portal to open it.

After you complete the tutorial, you might want to delete the cluster. With HDInsight, your data is stored in Azure Storage, so you can safely delete a cluster when it is not in use. You are also charged for an HDInsight cluster, even when it is not in use. Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they are not in use. For the instructions of deleting a cluster, see Manage Hadoop clusters in HDInsight by using the Azure portal.

To begin working with your new HBase cluster, you can use the procedures found in Get started using HBase with Hadoop in HDInsight.

Connect to the HBase cluster using HBase Java RPC APIs

  1. Create an infrastructure as a service (IaaS) virtual machine into the same Azure virtual network and the same subnet. For instructions on creating a new IaaS virtual machine, see Create a Virtual Machine Running Windows Server. When following the steps in this document, you must use the following values for the Network configuration:

    • Virtual network: <Cluster name>-vnet
    • Subnet: subnet1

    Important

    Replace <Cluster name> with the name you used when creating the HDInsight cluster in previous steps.

    Using these values, the virtual machine is placed in the same virtual network and subnet as the HDInsight cluster. This configuration allows them to directly communicate with each other. There is a way to create an HDInsight cluster with an empty edge node. The edge node can be used to manage the cluster. For more information, see Use empty edge nodes in HDInsight.

  2. When using a Java application to connect to HBase remotely, you must use the fully qualified domain name (FQDN). To determine this, you must get the connection-specific DNS suffix of the HBase cluster. To do that, you can use one of the following methods:

    • Use a Web browser to make an Ambari call:

      Browse to https://<ClusterName>.azurehdinsight.net/api/v1/clusters/<ClusterName>/hosts?minimal_response=true. It turns a JSON file with the DNS suffixes.

    • Use the Ambari website:

      1. Browse to https://<ClusterName>.azurehdinsight.net.
      2. Click Hosts from the top menu.
    • Use Curl to make REST calls:

         curl -u <username>:<password> -k https://<clustername>.azurehdinsight.net/ambari/api/v1/clusters/<clustername>.azurehdinsight.net/services/hbase/components/hbrest
      

      In the JavaScript Object Notation (JSON) data returned, find the "host_name" entry. It contains the FQDN for the nodes in the cluster. For example:

      ...
      "host_name": "wordkernode0.<clustername>.b1.cloudapp.net
      ...
      

      The portion of the domain name beginning with the cluster name is the DNS suffix. For example, mycluster.b1.cloudapp.net.

    • Use Azure PowerShell

      Use the following Azure PowerShell script to register the Get-ClusterDetail function, which can be used to return the DNS suffix:

         function Get-ClusterDetail(
             [String]
             [Parameter( Position=0, Mandatory=$true )]
             $ClusterDnsName,
             [String]
             [Parameter( Position=1, Mandatory=$true )]
             $Username,
             [String]
             [Parameter( Position=2, Mandatory=$true )]
             $Password,
             [String]
             [Parameter( Position=3, Mandatory=$true )]
             $PropertyName
             )
         {
         <#
             .SYNOPSIS
             Displays information to facilitate an HDInsight cluster-to-cluster scenario within the same virtual network.
             .Description
             This command shows the following 4 properties of an HDInsight cluster:
             1. ZookeeperQuorum (supports only HBase type cluster)
                 Shows the value of HBase property "hbase.zookeeper.quorum".
             2. ZookeeperClientPort (supports only HBase type cluster)
                 Shows the value of HBase property "hbase.zookeeper.property.clientPort".
             3. HBaseRestServers (supports only HBase type cluster)
                 Shows a list of host FQDNs that run the HBase REST server.
             4. FQDNSuffix (supports all cluster types)
                 Shows the FQDN suffix of hosts in the cluster.
             .EXAMPLE
             Get-ClusterDetail -ClusterDnsName {clusterDnsName} -Username {username} -Password {password} -PropertyName ZookeeperQuorum
             This command shows the value of HBase property "hbase.zookeeper.quorum".
             .EXAMPLE
             Get-ClusterDetail -ClusterDnsName {clusterDnsName} -Username {username} -Password {password} -PropertyName ZookeeperClientPort
             This command shows the value of HBase property "hbase.zookeeper.property.clientPort".
             .EXAMPLE
             Get-ClusterDetail -ClusterDnsName {clusterDnsName} -Username {username} -Password {password} -PropertyName HBaseRestServers
             This command shows a list of host FQDNs that run the HBase REST server.
             .EXAMPLE
             Get-ClusterDetail -ClusterDnsName {clusterDnsName} -Username {username} -Password {password} -PropertyName FQDNSuffix
             This command shows the FQDN suffix of hosts in the cluster.
         #>
      
             $DnsSuffix = ".azurehdinsight.net"
      
             $ClusterFQDN = $ClusterDnsName + $DnsSuffix
             $webclient = new-object System.Net.WebClient
             $webclient.Credentials = new-object System.Net.NetworkCredential($Username, $Password)
      
             if($PropertyName -eq "ZookeeperQuorum")
             {
                 $Url = "https://" + $ClusterFQDN + "/ambari/api/v1/clusters/" + $ClusterFQDN + "/configurations?type=hbase-site&tag=default&fields=items/properties/hbase.zookeeper.quorum"
                 $Response = $webclient.DownloadString($Url)
                 $JsonObject = $Response | ConvertFrom-Json
                 Write-host $JsonObject.items[0].properties.'hbase.zookeeper.quorum'
             }
             if($PropertyName -eq "ZookeeperClientPort")
             {
                 $Url = "https://" + $ClusterFQDN + "/ambari/api/v1/clusters/" + $ClusterFQDN + "/configurations?type=hbase-site&tag=default&fields=items/properties/hbase.zookeeper.property.clientPort"
                 $Response = $webclient.DownloadString($Url)
                 $JsonObject = $Response | ConvertFrom-Json
                 Write-host $JsonObject.items[0].properties.'hbase.zookeeper.property.clientPort'
             }
             if($PropertyName -eq "HBaseRestServers")
             {
                 $Url1 = "https://" + $ClusterFQDN + "/ambari/api/v1/clusters/" + $ClusterFQDN + "/configurations?type=hbase-site&tag=default&fields=items/properties/hbase.rest.port"
                 $Response1 = $webclient.DownloadString($Url1)
                 $JsonObject1 = $Response1 | ConvertFrom-Json
                 $PortNumber = $JsonObject1.items[0].properties.'hbase.rest.port'
      
                 $Url2 = "https://" + $ClusterFQDN + "/ambari/api/v1/clusters/" + $ClusterFQDN + "/services/hbase/components/hbrest"
                 $Response2 = $webclient.DownloadString($Url2)
                 $JsonObject2 = $Response2 | ConvertFrom-Json
                 foreach ($host_component in $JsonObject2.host_components)
                 {
                     $ConnectionString = $host_component.HostRoles.host_name + ":" + $PortNumber
                     Write-host $ConnectionString
                 }
             }
             if($PropertyName -eq "FQDNSuffix")
             {
                 $Url = "https://" + $ClusterFQDN + "/ambari/api/v1/clusters/" + $ClusterFQDN + "/services/YARN/components/RESOURCEMANAGER"
                 $Response = $webclient.DownloadString($Url)
                 $JsonObject = $Response | ConvertFrom-Json
                 $FQDN = $JsonObject.host_components[0].HostRoles.host_name
                 $pos = $FQDN.IndexOf(".")
                 $Suffix = $FQDN.Substring($pos + 1)
                 Write-host $Suffix
             }
         }
      

      After running the Azure PowerShell script, use the following command to return the DNS suffix by using the Get-ClusterDetail function. Specify your HDInsight HBase cluster name, admin name, and admin password when using this command.

         Get-ClusterDetail -ClusterDnsName <yourclustername> -PropertyName FQDNSuffix -Username <clusteradmin> -Password <clusteradminpassword>
      

      This command returns the DNS suffix. For example, yourclustername.b4.internal.cloudapp.net.

To verify that the virtual machine can communicate with the HBase cluster, use the command ping headnode0.<dns suffix> from the virtual machine. For example, ping headnode0.mycluster.b1.cloudapp.net.

To use this information in a Java application, you can follow the steps in Use Maven to build Java applications that use HBase with HDInsight (Hadoop) to create an application. To have the application connect to a remote HBase server, modify the hbase-site.xml file in this example to use the FQDN for Zookeeper. For example:

<property>
    <name>hbase.zookeeper.quorum</name>
    <value>zookeeper0.<dns suffix>,zookeeper1.<dns suffix>,zookeeper2.<dns suffix></value>
</property>

Note

For more information about name resolution in Azure virtual networks, including how to use your own DNS server, see Name Resolution (DNS).

Next steps

In this tutorial, you learned how to create an HBase cluster. To learn more, see: