您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

快速入门:使用 Apache Zeppelin 在 Azure HDInsight 中执行 Apache Hive 查询Quickstart: Execute Apache Hive queries in Azure HDInsight with Apache Zeppelin

本快速入门介绍如何使用 Apache Zeppelin 在 Azure HDInsight 中运行 Apache Hive 查询。In this quickstart, you learn how to use Apache Zeppelin to run Apache Hive queries in Azure HDInsight. HDInsight 交互式查询群集包括可用来运行交互式 Hive 查询的 Apache Zeppelin 笔记本。HDInsight Interactive Query clusters include Apache Zeppelin notebooks that you can use to run interactive Hive queries.

如果没有 Azure 订阅,请在开始之前创建一个免费帐户If you don't have an Azure subscription, create a free account before you begin.

先决条件Prerequisites

一个 HDInsight 交互式查询群集。An HDInsight Interactive Query cluster. 若要创建 HDInsight 群集,请参阅创建群集See Create cluster to create an HDInsight cluster. 请确保选择“交互式查询”群集类型。 Make sure to choose the Interactive Query cluster type.

创建 Apache Zeppelin 笔记Create an Apache Zeppelin Note

  1. 请将以下 URL 中的 CLUSTERNAME 替换为你的群集的名称:https://CLUSTERNAME.azurehdinsight.net/zeppelinReplace CLUSTERNAME with the name of your cluster in the following URL https://CLUSTERNAME.azurehdinsight.net/zeppelin. 然后在 Web 浏览器中输入该 URL。Then enter the URL in a web browser.

  2. 输入群集登录用户名和密码。Enter your cluster login username and password. 在 Zeppelin 页中,可以创建新笔记,也可以打开现有笔记。From the Zeppelin page, you can either create a new note or open existing notes. HiveSample 包含一些示例 Hive 查询。HiveSample contains some sample Hive queries.

    HDInsight 交互式查询 zeppelin

  3. 选择“创建新笔记”。 Select Create new note.

  4. 在“创建新笔记”对话框中,键入或选择以下值: From the Create new note dialog, type or select the following values:

    • 笔记名称:输入笔记的名称。Note Name: Enter a name for the note.
    • 默认解释器:从下拉列表中选择“jdbc”。 Default interpreter: Select jdbc from the drop-down list.
  5. 选择“创建笔记” 。Select Create Note.

  6. 在代码部分输入以下 Hive 查询,然后按 Shift + EnterEnter the following Hive query in the code section, and then press Shift + Enter:

    %jdbc(hive)
    show tables
    

    HDInsight 交互式查询 zeppelin 运行查询

    第一行中的 %jdbc(hive) 语句告诉笔记本使用 Hive JDBC 解释程序。The %jdbc(hive) statement in the first line tells the notebook to use the Hive JDBC interpreter.

    该查询将返回一个名为 hivesampletable 的 Hive 表。The query shall return one Hive table called hivesampletable.

    以下是可以针对 hivesampletable 运行的两个附加 Hive 查询:The following are two additional Hive queries that you can run against hivesampletable:

    %jdbc(hive)
    select * from hivesampletable limit 10
    
    %jdbc(hive)
    select ${group_name}, count(*) as total_count
    from hivesampletable
    group by ${group_name=market,market|deviceplatform|devicemake}
    limit ${total_count=10}
    

    与传统 Hive 相比,返回查询结果的速度更快。Comparing to the traditional Hive, the query results come back must faster.

其他示例Additional examples

  1. 创建表。Create a table. 在 Zeppelin 笔记本中执行以下代码:Execute the code below in the Zeppelin Notebook:

    %jdbc(hive)
    CREATE EXTERNAL TABLE log4jLogs (
        t1 string,
        t2 string,
        t3 string,
        t4 string,
        t5 string,
        t6 string,
        t7 string)
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ' '
    STORED AS TEXTFILE;
    
  2. 将数据加载到新表中。Load data into the new table. 在 Zeppelin 笔记本中执行以下代码:Execute the code below in the Zeppelin Notebook:

    %jdbc(hive)
    LOAD DATA
    INPATH 'wasbs:///example/data/sample.log'
    INTO TABLE log4jLogs;
    
  3. 插入单个记录。Insert a single record. 在 Zeppelin 笔记本中执行以下代码:Execute the code below in the Zeppelin Notebook:

    %jdbc(hive)
    INSERT INTO TABLE log4jLogs2
    VALUES ('A', 'B', 'C', 'D', 'E', 'F', 'G');
    

查看 Hive 语言手册以了解更多语法。Review the Hive language manual for additional syntax.

清理资源Clean up resources

完成本快速入门后,可以删除群集。After you complete the quickstart, you may want to delete the cluster. 有了 HDInsight,便可以将数据存储在 Azure 存储中,因此可以在群集不用时安全地删除群集。With HDInsight, your data is stored in Azure Storage, so you can safely delete a cluster when it isn't in use. 此外,还需要为 HDInsight 群集付费,即使不用也是如此。You're also charged for an HDInsight cluster, even when it isn't in use. 由于群集费用数倍于存储空间费用,因此在群集不用时删除群集可以节省费用。Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they aren't in use.

若要删除群集,请参阅使用浏览器、PowerShell 或 Azure CLI 删除 HDInsight 群集To delete a cluster, see Delete an HDInsight cluster using your browser, PowerShell, or the Azure CLI.

后续步骤Next steps

本快速入门介绍了如何使用 Apache Zeppelin 在 Azure HDInsight 中运行 Apache Hive 查询。In this quickstart, you learned how to use Apache Zeppelin to run Apache Hive queries in Azure HDInsight. 若要详细了解 Hive 查询,请参阅下一篇文章,其中介绍了如何使用 Visual Studio 执行查询。To learn more about Hive queries, the next article will show you how to execute queries with Visual Studio.