您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

将 Apache HBase 群集迁移到新版本Migrate an Apache HBase cluster to a new version

本文介绍将 Azure HDInsight 上的 Apache HBase 群集更新到较新版本所需的步骤。This article discusses the steps required to update your Apache HBase cluster on Azure HDInsight to a newer version.

升级造成的停机时间应该很短,以分钟计。The downtime while upgrading should be minimal, on the order of minutes. 停机是执行刷新所有内存中数据的步骤,然后在新群集上配置和重启服务造成的。This downtime is caused by the steps to flush all in-memory data, then the time to configure and restart the services on the new cluster. 根据节点数目、数据量和其他变数,结果会有所不同。Your results will vary, depending on the number of nodes, amount of data, and other variables.

检查 Apache HBase 兼容性Review Apache HBase compatibility

在升级 Apache HBase 之前,请确保源群集和目标群集上的 HBase 版本兼容。Before upgrading Apache HBase, ensure the HBase versions on the source and destination clusters are compatible. 有关详细信息,请参阅 Apache HDInsight 提供的 Hadoop 组件和版本For more information, see Apache Hadoop components and versions available with HDInsight.

备注

我们强烈建议查看 HBase 书册中的版本兼容性矩阵。We highly recommend that you review the version compatibility matrix in the HBase book. HBase 版本发行说明中应会阐述任何重大的不兼容性。Any breaking incompatibilities should be described in the HBase version release notes.

下面是一个示例版本兼容性矩阵。Here is an example version compatibility matrix. Y 指示兼容性,N 表示可能的不兼容性:Y indicates compatibility and N indicates a potential incompatibility:

兼容性类型Compatibility type 主版本Major version 次版本Minor version 修补程序Patch
客户端-服务器网络兼容性Client-Server wire compatibility NN YY YY
服务器-服务器兼容性Server-Server compatibility NN YY YY
文件格式兼容性File format compatibility NN YY YY
客户端 API 兼容性Client API compatibility NN YY YY
客户端二进制文件兼容性Client binary compatibility NN NN YY
服务器端受限的 API 兼容性Server-side limited API compatibility
StableStable NN YY YY
不断变化Evolving NN NN YY
不稳定Unstable NN NN NN
依赖项兼容性Dependency compatibility NN YY YY
操作兼容性Operational compatibility NN NN YY

使用相同的 Apache HBase 主版本升级Upgrade with same Apache HBase major version

若要升级 Azure HDInsight 上的 Apache HBase 群集,请完成以下步骤:To upgrade your Apache HBase cluster on Azure HDInsight, complete the following steps:

  1. 请确保应用程序与新版本兼容,如 HBase 兼容性矩阵和发行说明中所述。Make sure that your application is compatible with the new version, as shown in the HBase compatibility matrix and release notes. 在运行 HDInsight 和 HBase 目标版本的群集中测试应用程序。Test your application in a cluster running the target version of HDInsight and HBase.

  2. 使用相同的存储帐户、不同的容器名称设置新的目标 HDInsight 群集Set up a new destination HDInsight cluster using the same storage account, but with a different container name:

    使用相同的存储帐户,但创建不同的容器

  3. 刷新源 HBase 群集,即正在升级的群集。Flush your source HBase cluster, which is the cluster you're upgrading. HBase 将传入的数据写入名为 memstore 的内存中存储。HBase writes incoming data to an in-memory store, called a memstore. 在 memstore 达到一定大小后,HBase 会将其在群集的存储帐户中的长期存储中刷新到磁盘。After the memstore reaches a certain size, HBase flushes it to disk for long-term storage in the cluster's storage account. 删除旧群集时,将回收 memstores,这可能会丢失数据。When deleting the old cluster, the memstores are recycled, potentially losing data. 若要将每个表的 memstore 手动刷新到磁盘,请运行以下脚本。To manually flush the memstore for each table to disk, run the following script. Azure 的 GitHub 中提供了此脚本的最新版本。The latest version of this script is on Azure's GitHub.

    #!/bin/bash
    
    #-------------------------------------------------------------------------------#
    # SCRIPT TO FLUSH ALL HBASE TABLES.
    #-------------------------------------------------------------------------------#
    
    LIST_OF_TABLES=/tmp/tables.txt
    HBASE_SCRIPT=/tmp/hbase_script.txt
    TARGET_HOST=$1
    
    usage ()
    {
        if [[ "$1" == "-h" ]] || [[ "$1" == "--help" ]]
        then
            cat << ...
    
    Usage: 
    
    $0 [hostname]
    
    Providing hostname is optional and not required when the script is executed within HDInsight cluster with access to 'hbase shell'.
    
    However hostname should be provided when executing the script as a script-action from HDInsight portal.
    
    For Example:
    
        1.  Executing script inside HDInsight cluster (where 'hbase shell' is 
            accessible):
    
            $0 
    
            [No need to provide hostname]
    
        2.  Executing script from HDinsight Azure portal:
    
            Provide Script URL.
    
            Provide hostname as a parameter (i.e. hn0, hn1, hn2.. or wn2 etc.).
    ...
            exit
        fi
    }
    
    validate_machine ()
    {
        THIS_HOST=`hostname`
    
        if [[ ! -z "$TARGET_HOST" ]] && [[ $THIS_HOST  != $TARGET_HOST* ]]
        then
            echo "[INFO] This machine '$THIS_HOST' is not the right machine ($TARGET_HOST) to execute the script."
            exit 0
        fi
    }
    
    get_tables_list ()
    {
    hbase shell << ... > $LIST_OF_TABLES 2> /dev/null
        list
        exit
    ...
    }
    
    add_table_for_flush ()
    {
        TABLE_NAME=$1
        echo "[INFO] Adding table '$TABLE_NAME' to flush list..."
        cat << ... >> $HBASE_SCRIPT
            flush '$TABLE_NAME'
    ...
    }
    
    clean_up ()
    {
        rm -f $LIST_OF_TABLES
        rm -f $HBASE_SCRIPT
    }
    
    ########
    # MAIN #
    ########
    
    usage $1
    
    validate_machine
    
    clean_up
    
    get_tables_list
    
    START=false
    
    while read LINE 
    do 
        if [[ $LINE == TABLE ]] 
        then
            START=true
            continue
        elif [[ $LINE == *row*in*seconds ]]
        then
            break
        elif [[ $START == true ]]
        then
            add_table_for_flush $LINE
        fi
    
    done < $LIST_OF_TABLES
    
    cat $HBASE_SCRIPT
    
    hbase shell $HBASE_SCRIPT << ... 2> /dev/null
    exit
    ...
    
    
  4. 停止引入到旧 HBase 群集。Stop ingestion to the old HBase cluster.

  5. 为确保刷新 memstore 中的所有最新数据,请再次运行前面的脚本。To ensure that any recent data in the memstore is flushed, run the previous script again.

  6. 登录到旧群集上的Apache Ambarihttps://OLDCLUSTERNAME.azurehdidnsight.net)并停止 HBase 服务。Sign in to Apache Ambari on the old cluster (https://OLDCLUSTERNAME.azurehdidnsight.net) and stop the HBase services. 当系统提示你确认是否要停止服务时,请选中该复选框以启用 HBase 的维护模式。When you prompted to confirm that you'd like to stop the services, check the box to turn on maintenance mode for HBase. 有关连接和使用 Ambari 的详细信息,请参阅使用 Ambari Web UI 管理 HDInsight 群集For more information on connecting to and using Ambari, see Manage HDInsight clusters by using the Ambari Web UI.

    在 Ambari 的 "服务操作" 下,单击 "服务 > HBase >

    选中“为 HBase 启用维护模式”复选框,然后确认

  7. 登录到新的 HDInsight 群集上的 Ambari。Sign in to Ambari on the new HDInsight cluster. fs.defaultFS HDFS 设置更改为指向原始群集所用的容器名称。Change the fs.defaultFS HDFS setting to point to the container name used by the original cluster. 此设置位于“HDFS”>“配置”>“高级”>“高级 core-site”下。This setting is under HDFS > Configs > Advanced > Advanced core-site.

    在 Ambari 中,单击 "服务" > HDFS > 配置 > 高级

    在 Ambari 中更改容器名称

  8. 如果未使用带有增强写入功能的 HBase 群集,请跳过此步骤。If you aren't using HBase clusters with the Enhanced Writes feature, skip this step. 只有具有增强写入功能的 HBase 群集才需要该功能。It's needed only for HBase clusters with Enhanced Writes feature.

    hbase.rootdir 路径更改为指向原始群集的容器。Change the hbase.rootdir path to point to the container of the original cluster.

    在 Ambari 中,更改 HBase rootdir 的容器名称

  9. 如果要将 HDInsight 3.6 升级到4.0,请遵循以下步骤,否则请跳到步骤10:If you're upgrading HDInsight 3.6 to 4.0, follow the steps below, otherwise skip to step 10:

    1. 通过选择 "服务" > "全部重启",重新启动 Ambari 中所有必需的服务。Restart all required services in Ambari by selecting Services > Restart All Required.
    2. 停止 HBase 服务。Stop the HBase service.
    3. 通过 SSH 连接到 Zookeeper 节点,并执行zkCli命令 rmr /hbase-unsecure 从 Zookeeper 删除 HBase root znode。SSH to the Zookeeper node, and execute the zkCli command rmr /hbase-unsecure to remove the HBase root znode from Zookeeper.
    4. 重新启动 HBase。Restart HBase.
  10. 如果要升级到4.0 以外的任何其他 HDInsight 版本,请执行以下步骤:If you're upgrading to any other HDInsight version besides 4.0, follow these steps:

    1. 保存所做更改。Save your changes.
    2. 根据 Ambari 中的指示重启全部所需的服务。Restart all required services as indicated by Ambari.
  11. 将应用程序指向新群集。Point your application to the new cluster.

    备注

    升级时,应用程序的静态 DNS 会更改。The static DNS for your application changes when upgrading. 不要硬编码此 DNS,可以在域名的 DNS 设置中配置一个指向群集名称的 CNAME。Rather than hard-coding this DNS, you can configure a CNAME in your domain name's DNS settings that points to the cluster's name. 另一种做法是使用应用程序的、无需重新部署即可更新的配置文件。Another option is to use a configuration file for your application that you can update without redeploying.

  12. 启动引入,确定一切是否按预期正常运行。Start the ingestion to see if everything is functioning as expected.

  13. 如果新群集符合预期,请删除原始群集。If the new cluster is satisfactory, delete the original cluster.

后续步骤Next steps

若要详细了解 Apache HBase 以及如何升级 HDInsight 群集,请参阅以下文章:To learn more about Apache HBase and upgrading HDInsight clusters, see the following articles: