Create Apache HBase clusters on HDInsight in Azure Virtual Network

Learn how to create Azure HDInsight Apache HBase clusters in an Azure Virtual Network.

With virtual network integration, Apache HBase clusters can be deployed to the same virtual network as your applications so that applications can communicate with HBase directly. The benefits include:

  • Direct connectivity of the web application to the nodes of the HBase cluster, which enables communication via HBase Java remote procedure call (RPC) APIs.
  • Improved performance by not having your traffic go over multiple gateways and load-balancers.
  • The ability to process sensitive information in a more secure manner without exposing a public endpoint.

If you don't have an Azure subscription, create a free account before you begin.

Create Apache HBase cluster into virtual network

In this section, you create a Linux-based Apache HBase cluster with the dependent Azure Storage account in an Azure virtual network using an Azure Resource Manager template. For other cluster creation methods and understanding the settings, see Create HDInsight clusters. For more information about using a template to create Apache Hadoop clusters in HDInsight, see Create Apache Hadoop clusters in HDInsight using Azure Resource Manager templates

Note

Some properties are hard-coded into the template. For example:

  • Location: East US 2
  • Cluster version: 3.6
  • Cluster worker node count: 2
  • Default storage account: a unique string
  • Virtual network name: CLUSTERNAME-vnet
  • Virtual network address space: 10.0.0.0/16
  • Subnet name: subnet1
  • Subnet address range: 10.0.0.0/24

CLUSTERNAME is replaced with the cluster name you provide when using the template.

  1. Select the following image to open the template in the Azure portal. The template is located in Azure quickstart templates.

    Deploy to Azure button for new cluster

  2. From the Custom deployment dialog, select Edit template.

  3. On line 165, change value Standard_A3 to Standard_A4_V2. Then select Save.

  4. Complete the remaining template with the following information:

    Property Value
    Subscription Select an Azure subscription used to create the HDInsight cluster, the dependent Storage account and the Azure virtual network.
    Resource group Select Create new, and specify a new resource group name.
    Location Select a location for the resource group.
    Cluster Name Enter a name for the Hadoop cluster to be created.
    Cluster Login User Name and Password The default User Name is admin. Provide a password.
    Ssh User Name and Password The default User Name is sshuser. Provide a password.

    Select I agree to the terms and the conditions stated above.

  5. Select Purchase. It takes about around 20 minutes to create a cluster. Once the cluster is created, you can select the cluster in the portal to open it.

After you complete the article, you might want to delete the cluster. With HDInsight, your data is stored in Azure Storage, so you can safely delete a cluster when it is not in use. You are also charged for an HDInsight cluster, even when it is not in use. Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they are not in use. For the instructions of deleting a cluster, see Manage Apache Hadoop clusters in HDInsight by using the Azure portal.

To begin working with your new HBase cluster, you can use the procedures found in Get started using Apache HBase with Apache Hadoop in HDInsight.

Connect to the Apache HBase cluster using Apache HBase Java RPC APIs

Create a virtual machine

Create an infrastructure as a service (IaaS) virtual machine into the same Azure virtual network and the same subnet. For instructions on creating a new IaaS virtual machine, see Create a Virtual Machine Running Windows Server. When following the steps in this document, you must use the following values for the Network configuration:

  • Virtual network: CLUSTERNAME-vnet
  • Subnet: subnet1

Important

Replace CLUSTERNAME with the name you used when creating the HDInsight cluster in previous steps.

Using these values, the virtual machine is placed in the same virtual network and subnet as the HDInsight cluster. This configuration allows them to directly communicate with each other. There is a way to create an HDInsight cluster with an empty edge node. The edge node can be used to manage the cluster. For more information, see Use empty edge nodes in HDInsight.

Obtain fully qualified domain name

When using a Java application to connect to HBase remotely, you must use the fully qualified domain name (FQDN). To determine this, you must get the connection-specific DNS suffix of the HBase cluster. To do that, you can use one of the following methods:

  • Use a Web browser to make an Apache Ambari call:

    Browse to https://CLUSTERNAME.azurehdinsight.net/api/v1/clusters/CLUSTERNAME/hosts?minimal_response=true. It returns a JSON file with the DNS suffixes.

  • Use the Ambari website:

    1. Browse to https://CLUSTERNAME.azurehdinsight.net.
    2. Select Hosts from the top menu.
  • Use Curl to make REST calls:

    curl -u <username>:<password> -k https://CLUSTERNAME.azurehdinsight.net/ambari/api/v1/clusters/CLUSTERNAME.azurehdinsight.net/services/hbase/components/hbrest
    

In the JavaScript Object Notation (JSON) data returned, find the "host_name" entry. It contains the FQDN for the nodes in the cluster. For example:

"host_name" : "hn0-hbaseg.hjfrnszlumfuhfk4pi1guh410c.bx.internal.cloudapp.net"

The portion of the domain name beginning with the cluster name is the DNS suffix. For example, hjfrnszlumfuhfk4pi1guh410c.bx.internal.cloudapp.net.

Verify communication inside virtual network

To verify that the virtual machine can communicate with the HBase cluster, use the command ping headnode0.<dns suffix> from the virtual machine. For example, ping hn0-hbaseg.hjfrnszlumfuhfk4pi1guh410c.bx.internal.cloudapp.net.

To use this information in a Java application, you can follow the steps in Use Apache Maven to build Java applications that use Apache HBase with HDInsight (Hadoop) to create an application. To have the application connect to a remote HBase server, modify the hbase-site.xml file in this example to use the FQDN for Zookeeper. For example:

<property>
    <name>hbase.zookeeper.quorum</name>
    <value>zookeeper0.<dns suffix>,zookeeper1.<dns suffix>,zookeeper2.<dns suffix></value>
</property>

Note

For more information about name resolution in Azure virtual networks, including how to use your own DNS server, see Name Resolution (DNS).

Next steps

In this article, you learned how to create an Apache HBase cluster. To learn more, see: