您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

在 Azure 虚拟网络中的 HDInsight 上创建 Apache HBase 群集Create Apache HBase clusters on HDInsight in Azure Virtual Network

了解如何在Azure 虚拟网络中创建 Azure HDInsight Apache HBase 群集。Learn how to create Azure HDInsight Apache HBase clusters in an Azure Virtual Network.

通过虚拟网络集成,可以将 Apache HBase 群集部署到应用程序所在的虚拟网络,以便应用程序直接与 HBase 进行通信。With virtual network integration, Apache HBase clusters can be deployed to the same virtual network as your applications so that applications can communicate with HBase directly. 优点包括:The benefits include:

  • 将 Web 应用程序直接连接到 HBase 群集节点,通过 HBase Java 远程过程调用 (RPC) API 实现通信。Direct connectivity of the web application to the nodes of the HBase cluster, which enables communication via HBase Java remote procedure call (RPC) APIs.
  • 提高性能,因为流量不必通过多个网关和负载均衡器。Improved performance by not having your traffic go over multiple gateways and load-balancers.
  • 能够以更安全的方式处理敏感信息,而无需公开公共终结点。The ability to process sensitive information in a more secure manner without exposing a public endpoint.

先决条件Prerequisites

在开始阅读本文前,必须具有以下项目:Before you begin this article, you must have the following items:

在虚拟网络中创建 Apache HBase 群集Create Apache HBase cluster into virtual network

在本部分中,通过 Azure 资源管理器模板在 Azure 虚拟网络中使用从属 Azure 存储帐户创建基于 Linux 的 Apache HBase 群集。In this section, you create a Linux-based Apache HBase cluster with the dependent Azure Storage account in an Azure virtual network using an Azure Resource Manager template. 对于其他群集创建方法以及了解设置,请参阅创建 HDInsight 群集For other cluster creation methods and understanding the settings, see Create HDInsight clusters. 有关使用模板在 HDInsight 中创建 Apache Hadoop 群集的详细信息,请参阅使用 Azure 资源管理器模板在 HDInsight 中创建 Apache Hadoop 群集For more information about using a template to create Apache Hadoop clusters in HDInsight, see Create Apache Hadoop clusters in HDInsight using Azure Resource Manager templates

备注

某些属性已在模板中硬编码。Some properties are hard-coded into the template. 例如:For example:

  • 位置:美国东部 2Location: East US 2
  • 群集版本:3.6Cluster version: 3.6
  • 群集工作节点计数:2Cluster worker node count: 2
  • 默认存储帐户:唯一字符串Default storage account: a unique string
  • 虚拟网络名称:<群集名称>-vnetVirtual network name: <Cluster Name>-vnet
  • 虚拟网络地址空间:10.0.0.0/16Virtual network address space: 10.0.0.0/16
  • 子网名称:subnet1Subnet name: subnet1
  • 子网地址范围:10.0.0.0/24Subnet address range: 10.0.0.0/24

<群集名称> 会替换为使用模板时提供的群集名称。<Cluster Name> is replaced with the cluster name you provide when using the template.

  1. 单击下面的图像即可在 Azure 门户中打开该模板。Click the following image to open the template in the Azure portal. 该模板位于 Azure 快速启动模板中。The template is located in Azure quickstart templates.

    Deploy to Azure button for new cluster

  2. 在“自定义部署”边栏选项卡中输入以下属性:From the Custom deployment blade, enter the following properties:

    • 订阅:选择用来创建 HDInsight 群集的 Azure 订阅、相关存储帐户和 Azure 虚拟网络。Subscription: Select an Azure subscription used to create the HDInsight cluster, the dependent Storage account and the Azure virtual network.
    • 资源组:选择“新建”,并指定新的资源组名称。Resource group: Select Create new, and specify a new resource group name.
    • 位置:选择资源组的位置。Location: Select a location for the resource group.
    • ClusterName:为要创建的 Hadoop 群集输入名称。ClusterName: Enter a name for the Hadoop cluster to be created.
    • 群集登录名和密码:默认登录名为“admin”。Cluster login name and password: The default login name is admin.
    • SSH 用户名和密码:默认用户名为“sshuser”。SSH username and password: The default username is sshuser. 可以重命名它。You can rename it.
    • 我同意上述条款和条件:(选择)I agree to the terms and the conditions stated above: (Select)
  3. 单击“购买”。Click Purchase. 创建群集大约需要 20 分钟时间。It takes about around 20 minutes to create a cluster. 创建群集之后,便可以在门户中单击群集边栏选项卡以打开它。Once the cluster is created, you can click the cluster blade in the portal to open it.

完成文章后,你可能想要删除群集。After you complete the article, you might want to delete the cluster. 有了 HDInsight,便可以将数据存储在 Azure 存储中,因此可以在群集不用时安全地删除群集。With HDInsight, your data is stored in Azure Storage, so you can safely delete a cluster when it is not in use. 此外,还需要为 HDInsight 群集付费,即使不用也是如此。You are also charged for an HDInsight cluster, even when it is not in use. 由于群集费用数倍于存储空间费用,因此在群集不用时删除群集可以节省费用。Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they are not in use. 有关删除群集的说明,请参阅使用 Azure 门户在 HDInsight 中管理 Apache Hadoop 群集For the instructions of deleting a cluster, see Manage Apache Hadoop clusters in HDInsight by using the Azure portal.

要开始处理新 HBase 群集,可以按照开始在 HDInsight 中将 Apache HBase 与 Apache Hadoop 配合使用中的步骤进行操作。To begin working with your new HBase cluster, you can use the procedures found in Get started using Apache HBase with Apache Hadoop in HDInsight.

使用 Apache HBase Java RPC API 连接到 Apache HBase 群集。Connect to the Apache HBase cluster using Apache HBase Java RPC APIs

  1. 将基础结构即服务 (IaaS) 虚拟机创建到相同的 Azure 虚拟网络和子网中。Create an infrastructure as a service (IaaS) virtual machine into the same Azure virtual network and the same subnet. 有关创建新 IaaS 虚拟机的说明,请参阅创建运行 Windows Server 的虚拟机For instructions on creating a new IaaS virtual machine, see Create a Virtual Machine Running Windows Server. 按照本文档中的步骤进行操作时,必须对网络配置使用以下值:When following the steps in this document, you must use the following values for the Network configuration:

    • 虚拟网络:<群集名称>-vnetVirtual network: <Cluster name>-vnet
    • 子网:subnet1Subnet: subnet1

    重要

    将 <群集名称> 替换为在先前步骤中创建 HDInsight 群集时使用的名称。Replace <Cluster name> with the name you used when creating the HDInsight cluster in previous steps.

    使用这些值可将虚拟机放置在与 HDInsight 群集相同的虚拟网络和子网中。Using these values, the virtual machine is placed in the same virtual network and subnet as the HDInsight cluster. 此配置让它们能够直接相互通信。This configuration allows them to directly communicate with each other. 有一种方法可使用空的边缘节点创建 HDInsight 群集。There is a way to create an HDInsight cluster with an empty edge node. 该边缘节点可用于管理群集。The edge node can be used to manage the cluster. 有关详细信息,请参阅在 HDInsight 中使用空边缘节点For more information, see Use empty edge nodes in HDInsight.

  2. 使用 Java 应用程序远程连接到 HBase 时,必须使用完全限定的域名 (FQDN)。When using a Java application to connect to HBase remotely, you must use the fully qualified domain name (FQDN). 要确定这一点,必须获取 HBase 群集的连接特定的 DNS 后缀。To determine this, you must get the connection-specific DNS suffix of the HBase cluster. 为此,可以使用以下方法之一:To do that, you can use one of the following methods:

    • 使用 Web 浏览器进行 Apache Ambari 调用:Use a Web browser to make an Apache Ambari call:

      浏览到 https://<ClusterName>.azurehdinsight.net/api/v1/clusters/<ClusterName>/hosts?minimal_response=true。Browse to https://<ClusterName>.azurehdinsight.net/api/v1/clusters/<ClusterName>/hosts?minimal_response=true. 随后将返回带有 DNS 后缀的 JSON 文件。It turns a JSON file with the DNS suffixes.

    • 使用 Ambari 网站:Use the Ambari website:

      1. 浏览到 https://<ClusterName>.azurehdinsight.net。Browse to https://<ClusterName>.azurehdinsight.net.
      2. 在顶部菜单中单击“主机”。Click Hosts from the top menu.
    • 使用 Curl 发出 REST 调用:Use Curl to make REST calls:

         curl -u <username>:<password> -k https://<clustername>.azurehdinsight.net/ambari/api/v1/clusters/<clustername>.azurehdinsight.net/services/hbase/components/hbrest
      

      在返回的 JavaScript 对象表示法 (JSON) 数据中,找到“host_name”条目。In the JavaScript Object Notation (JSON) data returned, find the "host_name" entry. 此条目包含群集中的节点的 FQDN。It contains the FQDN for the nodes in the cluster. 例如:For example:

      ...
      "host_name": "wordkernode0.<clustername>.b1.cloudapp.net
      ...
      

      以群集名称开头的域名的部分是 DNS 后缀。The portion of the domain name beginning with the cluster name is the DNS suffix. 例如,mycluster.b1.cloudapp.net。For example, mycluster.b1.cloudapp.net.

    • 使用 Azure PowerShellUse Azure PowerShell

      使用以下 Azure PowerShell 脚本注册 Get-ClusterDetail 函数,该函数可用于返回 DNS 后缀:Use the following Azure PowerShell script to register the Get-ClusterDetail function, which can be used to return the DNS suffix:

         function Get-ClusterDetail(
             [String]
             [Parameter( Position=0, Mandatory=$true )]
             $ClusterDnsName,
             [String]
             [Parameter( Position=1, Mandatory=$true )]
             $Username,
             [String]
             [Parameter( Position=2, Mandatory=$true )]
             $Password,
             [String]
             [Parameter( Position=3, Mandatory=$true )]
             $PropertyName
             )
         {
         <#
             .SYNOPSIS
             Displays information to facilitate an HDInsight cluster-to-cluster scenario within the same virtual network.
             .Description
             This command shows the following 4 properties of an HDInsight cluster:
             1. ZookeeperQuorum (supports only HBase type cluster)
                 Shows the value of HBase property "hbase.zookeeper.quorum".
             2. ZookeeperClientPort (supports only HBase type cluster)
                 Shows the value of HBase property "hbase.zookeeper.property.clientPort".
             3. HBaseRestServers (supports only HBase type cluster)
                 Shows a list of host FQDNs that run the HBase REST server.
             4. FQDNSuffix (supports all cluster types)
                 Shows the FQDN suffix of hosts in the cluster.
             .EXAMPLE
             Get-ClusterDetail -ClusterDnsName {clusterDnsName} -Username {username} -Password {password} -PropertyName ZookeeperQuorum
             This command shows the value of HBase property "hbase.zookeeper.quorum".
             .EXAMPLE
             Get-ClusterDetail -ClusterDnsName {clusterDnsName} -Username {username} -Password {password} -PropertyName ZookeeperClientPort
             This command shows the value of HBase property "hbase.zookeeper.property.clientPort".
             .EXAMPLE
             Get-ClusterDetail -ClusterDnsName {clusterDnsName} -Username {username} -Password {password} -PropertyName HBaseRestServers
             This command shows a list of host FQDNs that run the HBase REST server.
             .EXAMPLE
             Get-ClusterDetail -ClusterDnsName {clusterDnsName} -Username {username} -Password {password} -PropertyName FQDNSuffix
             This command shows the FQDN suffix of hosts in the cluster.
         #>
      
             $DnsSuffix = ".azurehdinsight.net"
      
             $ClusterFQDN = $ClusterDnsName + $DnsSuffix
             $webclient = new-object System.Net.WebClient
             $webclient.Credentials = new-object System.Net.NetworkCredential($Username, $Password)
      
             if($PropertyName -eq "ZookeeperQuorum")
             {
                 $Url = "https://" + $ClusterFQDN + "/ambari/api/v1/clusters/" + $ClusterFQDN + "/configurations?type=hbase-site&tag=default&fields=items/properties/hbase.zookeeper.quorum"
                 $Response = $webclient.DownloadString($Url)
                 $JsonObject = $Response | ConvertFrom-Json
                 Write-host $JsonObject.items[0].properties.'hbase.zookeeper.quorum'
             }
             if($PropertyName -eq "ZookeeperClientPort")
             {
                 $Url = "https://" + $ClusterFQDN + "/ambari/api/v1/clusters/" + $ClusterFQDN + "/configurations?type=hbase-site&tag=default&fields=items/properties/hbase.zookeeper.property.clientPort"
                 $Response = $webclient.DownloadString($Url)
                 $JsonObject = $Response | ConvertFrom-Json
                 Write-host $JsonObject.items[0].properties.'hbase.zookeeper.property.clientPort'
             }
             if($PropertyName -eq "HBaseRestServers")
             {
                 $Url1 = "https://" + $ClusterFQDN + "/ambari/api/v1/clusters/" + $ClusterFQDN + "/configurations?type=hbase-site&tag=default&fields=items/properties/hbase.rest.port"
                 $Response1 = $webclient.DownloadString($Url1)
                 $JsonObject1 = $Response1 | ConvertFrom-Json
                 $PortNumber = $JsonObject1.items[0].properties.'hbase.rest.port'
      
                 $Url2 = "https://" + $ClusterFQDN + "/ambari/api/v1/clusters/" + $ClusterFQDN + "/services/hbase/components/hbrest"
                 $Response2 = $webclient.DownloadString($Url2)
                 $JsonObject2 = $Response2 | ConvertFrom-Json
                 foreach ($host_component in $JsonObject2.host_components)
                 {
                     $ConnectionString = $host_component.HostRoles.host_name + ":" + $PortNumber
                     Write-host $ConnectionString
                 }
             }
             if($PropertyName -eq "FQDNSuffix")
             {
                 $Url = "https://" + $ClusterFQDN + "/ambari/api/v1/clusters/" + $ClusterFQDN + "/services/YARN/components/RESOURCEMANAGER"
                 $Response = $webclient.DownloadString($Url)
                 $JsonObject = $Response | ConvertFrom-Json
                 $FQDN = $JsonObject.host_components[0].HostRoles.host_name
                 $pos = $FQDN.IndexOf(".")
                 $Suffix = $FQDN.Substring($pos + 1)
                 Write-host $Suffix
             }
         }
      

      运行 Azure PowerShell 脚本后,使用以下命令通过 Get-ClusterDetail 函数来返回 DNS 后缀。After running the Azure PowerShell script, use the following command to return the DNS suffix by using the Get-ClusterDetail function. 使用此命令时,指定 HDInsight HBase 群集名称、管理员名称和管理员密码。Specify your HDInsight HBase cluster name, admin name, and admin password when using this command.

         Get-ClusterDetail -ClusterDnsName <yourclustername> -PropertyName FQDNSuffix -Username <clusteradmin> -Password <clusteradminpassword>
      

      此命令返回 DNS 后缀。This command returns the DNS suffix. 例如 yourclustername.b4.internal.cloudapp.netFor example, yourclustername.b4.internal.cloudapp.net.

若要验证虚拟机是否可与 HBase 群集进行通信,请从虚拟机使用 ping headnode0.<dns suffix> 命令。To verify that the virtual machine can communicate with the HBase cluster, use the command ping headnode0.<dns suffix> from the virtual machine. 例如,ping headnode0.mycluster.b1.cloudapp.net。For example, ping headnode0.mycluster.b1.cloudapp.net.

要在 Java 应用程序中使用此信息,可以按照使用 Apache Maven 构建将 Apache HBase 与 HDInsight (Hadoop) 配合使用的 Java 应用程序中的步骤来创建应用程序。To use this information in a Java application, you can follow the steps in Use Apache Maven to build Java applications that use Apache HBase with HDInsight (Hadoop) to create an application. 若要让应用程序连接到远程 HBase 服务器,请修改本示例中的 hbase-site.xml 文件,以对 Zookeeper 使用 FQDN。To have the application connect to a remote HBase server, modify the hbase-site.xml file in this example to use the FQDN for Zookeeper. 例如:For example:

<property>
    <name>hbase.zookeeper.quorum</name>
    <value>zookeeper0.<dns suffix>,zookeeper1.<dns suffix>,zookeeper2.<dns suffix></value>
</property>

备注

有关 Azure 虚拟网络中的名称解析的详细信息,包括如何使用自己的 DNS 服务器,请参阅名称解析 (DNS)For more information about name resolution in Azure virtual networks, including how to use your own DNS server, see Name Resolution (DNS).

后续步骤Next steps

本文介绍了如何创建 Apache HBase 群集。In this article, you learned how to create an Apache HBase cluster. 若要了解更多信息,请参阅以下文章:To learn more, see: