您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

将 Apache HBase 群集迁移到新版本Migrate an Apache HBase cluster to a new version

本文介绍如何通过必要步骤将 Azure HDInsight 上的 Apache HBase 群集更新为新版本。This article discusses the steps required to update your Apache HBase cluster on Azure HDInsight to a newer version.

升级造成的停机时间应该很短,以分钟计。The downtime while upgrading should be minimal, on the order of minutes. 停机是执行刷新所有内存中数据的步骤,然后在新群集上配置和重启服务造成的。This downtime is caused by the steps to flush all in-memory data, then the time to configure and restart the services on the new cluster. 根据节点数目、数据量和其他变数,结果会有所不同。Your results will vary, depending on the number of nodes, amount of data, and other variables.

检查 Apache HBase 兼容性Review Apache HBase compatibility

在升级 Apache HBase 之前,请确保源群集和目标群集上的 HBase 版本兼容。Before upgrading Apache HBase, ensure the HBase versions on the source and destination clusters are compatible. 有关详细信息,请参阅 HDInsight 提供的 Apache Hadoop 组件和版本For more information, see Apache Hadoop components and versions available with HDInsight.

备注

我们强烈建议查看 HBase 书册中的版本兼容性矩阵。We highly recommend that you review the version compatibility matrix in the HBase book. HBase 版本发行说明中应会阐述任何重大的不兼容性。Any breaking incompatibilities should be described in the HBase version release notes.

下面是一个示例性的版本兼容性对照表。Here is an example version compatibility matrix. Y 表示兼容,N 表示可能不兼容:Y indicates compatibility and N indicates a potential incompatibility:

兼容性类型Compatibility type 主版本Major version 次版本Minor version 修补程序Patch
客户端-服务器网络兼容性Client-Server wire compatibility NN YY YY
服务器-服务器兼容性Server-Server compatibility NN YY YY
文件格式兼容性File format compatibility NN YY YY
客户端 API 兼容性Client API compatibility NN YY YY
客户端二进制文件兼容性Client binary compatibility NN NN YY
服务器端受限的 API 兼容性Server-side limited API compatibility
StableStable NN YY YY
不断变化Evolving NN NN YY
不稳定Unstable NN NN NN
依赖项兼容性Dependency compatibility NN YY YY
操作兼容性Operational compatibility NN NN YY

使用相同的 Apache HBase 主版本升级Upgrade with same Apache HBase major version

若要升级 Azure HDInsight 上的 Apache HBase 群集,请完成以下步骤:To upgrade your Apache HBase cluster on Azure HDInsight, complete the following steps:

  1. 请确保应用程序与新版本兼容,如 HBase 兼容性矩阵和发行说明中所述。Make sure that your application is compatible with the new version, as shown in the HBase compatibility matrix and release notes. 在运行 HDInsight 和 HBase 目标版本的群集中测试应用程序。Test your application in a cluster running the target version of HDInsight and HBase.

  2. 使用相同的存储帐户、不同的容器名称设置新的目标 HDInsight 群集Set up a new destination HDInsight cluster using the same storage account, but with a different container name:

    使用相同的存储帐户,但创建不同的容器

  3. 刷新源 HBase 群集,即正在升级的群集。Flush your source HBase cluster, which is the cluster you're upgrading. HBase 将传入的数据写入名为 memstore 的内存中存储。HBase writes incoming data to an in-memory store, called a memstore. memstore 达到特定的大小后,HBase 会将其刷新到群集存储帐户中用作长期存储的磁盘中。After the memstore reaches a certain size, HBase flushes it to disk for long-term storage in the cluster's storage account. 删除旧群集时,将回收 memstores,这可能会丢失数据。When deleting the old cluster, the memstores are recycled, potentially losing data. 若要将每个表的 memstore 手动刷新到磁盘,请运行以下脚本。To manually flush the memstore for each table to disk, run the following script. Azure 的 GitHub 中提供了此脚本的最新版本。The latest version of this script is on Azure's GitHub.

    #!/bin/bash
    
    #-------------------------------------------------------------------------------#
    # SCRIPT TO FLUSH ALL HBASE TABLES.
    #-------------------------------------------------------------------------------#
    
    LIST_OF_TABLES=/tmp/tables.txt
    HBASE_SCRIPT=/tmp/hbase_script.txt
    TARGET_HOST=$1
    
    usage ()
    {
        if [[ "$1" == "-h" ]] || [[ "$1" == "--help" ]]
        then
            cat << ...
    
    Usage: 
    
    $0 [hostname]
    
    Providing hostname is optional and not required when the script is executed within HDInsight cluster with access to 'hbase shell'.
    
    However hostname should be provided when executing the script as a script-action from HDInsight portal.
    
    For Example:
    
        1.  Executing script inside HDInsight cluster (where 'hbase shell' is 
            accessible):
    
            $0 
    
            [No need to provide hostname]
    
        2.  Executing script from HDinsight Azure portal:
    
            Provide Script URL.
    
            Provide hostname as a parameter (i.e. hn0, hn1, hn2.. or wn2 etc.).
    ...
            exit
        fi
    }
    
    validate_machine ()
    {
        THIS_HOST=`hostname`
    
        if [[ ! -z "$TARGET_HOST" ]] && [[ $THIS_HOST  != $TARGET_HOST* ]]
        then
            echo "[INFO] This machine '$THIS_HOST' is not the right machine ($TARGET_HOST) to execute the script."
            exit 0
        fi
    }
    
    get_tables_list ()
    {
    hbase shell << ... > $LIST_OF_TABLES 2> /dev/null
        list
        exit
    ...
    }
    
    add_table_for_flush ()
    {
        TABLE_NAME=$1
        echo "[INFO] Adding table '$TABLE_NAME' to flush list..."
        cat << ... >> $HBASE_SCRIPT
            flush '$TABLE_NAME'
    ...
    }
    
    clean_up ()
    {
        rm -f $LIST_OF_TABLES
        rm -f $HBASE_SCRIPT
    }
    
    ########
    # MAIN #
    ########
    
    usage $1
    
    validate_machine
    
    clean_up
    
    get_tables_list
    
    START=false
    
    while read LINE 
    do 
        if [[ $LINE == TABLE ]] 
        then
            START=true
            continue
        elif [[ $LINE == *row*in*seconds ]]
        then
            break
        elif [[ $START == true ]]
        then
            add_table_for_flush $LINE
        fi
    
    done < $LIST_OF_TABLES
    
    cat $HBASE_SCRIPT
    
    hbase shell $HBASE_SCRIPT << ... 2> /dev/null
    exit
    ...
    
    
  4. 停止引入到旧 HBase 群集。Stop ingestion to the old HBase cluster.

  5. 为确保刷新 memstore 中的所有最新数据,请再次运行前面的脚本。To ensure that any recent data in the memstore is flushed, run the previous script again.

  6. 登录到旧群集上的Apache Ambarihttps://OLDCLUSTERNAME.azurehdidnsight.net ),并停止 HBase 服务。Sign in to Apache Ambari on the old cluster (https://OLDCLUSTERNAME.azurehdidnsight.net) and stop the HBase services. 当系统提示你确认想要停止这些服务时,请选中为 HBase 启用维护模式的框。When you prompted to confirm that you'd like to stop the services, check the box to turn on maintenance mode for HBase. 有关连接和使用 Ambari 的详细信息,请参阅使用 Ambari Web UI 管理 HDInsight 群集For more information on connecting to and using Ambari, see Manage HDInsight clusters by using the Ambari Web UI.

    在 Ambari 中的“服务操作”下,单击“服务”>“HBase”>“停止”

    选中“为 HBase 启用维护模式”复选框,然后确认

  7. 在新 HDInsight 群集上登录到 Ambari。Sign in to Ambari on the new HDInsight cluster. fs.defaultFS HDFS 设置更改为指向原始群集所用的容器名称。Change the fs.defaultFS HDFS setting to point to the container name used by the original cluster. 此设置位于“HDFS”>“配置”>“高级”>“高级 core-site”下。****This setting is under HDFS > Configs > Advanced > Advanced core-site.

    在 Ambari 中单击“服务”>“HDFS”>“配置”>“停止”

    在 Ambari 中更改容器名称

  8. 如果不使用带增强写入功能的 HBase 群集,请跳过此步骤。If you aren't using HBase clusters with the Enhanced Writes feature, skip this step. 带增强写入功能的 HBase 群集才需要它。It's needed only for HBase clusters with Enhanced Writes feature.

    hbase.rootdir 路径改为指向原始群集的容器。Change the hbase.rootdir path to point to the container of the original cluster.

    在 Ambari 中更改 HBase rootdir 的容器名称

  9. 若要将 HDInsight 3.6 升级到 4.0,请按以下步骤操作,否则请跳到步骤 10:If you're upgrading HDInsight 3.6 to 4.0, follow the steps below, otherwise skip to step 10:

    1. 选择“服务”**** > ****“重启所有必需服务”,以便重启 Ambari 中的所有必需服务。Restart all required services in Ambari by selecting Services > Restart All Required.
    2. 停止 HBase 服务。Stop the HBase service.
    3. 通过 SSH 连接到 Zookeeper 节点,执行 zkCli 命令 rmr /hbase-unsecure,以便从 Zookeeper 中删除 HBase 根 znode。SSH to the Zookeeper node, and execute the zkCli command rmr /hbase-unsecure to remove the HBase root znode from Zookeeper.
    4. 重启 HBase。Restart HBase.
  10. 若要升级到 4.0 之外的任何其他 HDInsight 版本,请执行以下步骤:If you're upgrading to any other HDInsight version besides 4.0, follow these steps:

    1. 保存所做更改。Save your changes.
    2. 根据 Ambari 中的指示重启全部所需的服务。Restart all required services as indicated by Ambari.
  11. 将应用程序指向新群集。Point your application to the new cluster.

    备注

    升级时,应用程序的静态 DNS 会更改。The static DNS for your application changes when upgrading. 不要硬编码此 DNS,可以在域名的 DNS 设置中配置一个指向群集名称的 CNAME。Rather than hard-coding this DNS, you can configure a CNAME in your domain name's DNS settings that points to the cluster's name. 另一种做法是使用应用程序的、无需重新部署即可更新的配置文件。Another option is to use a configuration file for your application that you can update without redeploying.

  12. 启动引入,确定一切是否按预期正常运行。Start the ingestion to see if everything is functioning as expected.

  13. 如果新群集符合预期,请删除原始群集。If the new cluster is satisfactory, delete the original cluster.

后续步骤Next steps

若要详细了解 Apache HBase 以及如何升级 HDInsight 群集,请参阅以下文章:To learn more about Apache HBase and upgrading HDInsight clusters, see the following articles: