您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

使用 Apache Sqoop 在 Apache Hadoop on HDInsight 与 SQL 数据库之间导入和导出数据Use Apache Sqoop to import and export data between Apache Hadoop on HDInsight and SQL Database

了解如何使用 Apache Sqoop 在 Azure HDInsight 中的 Apache Hadoop 群集与 Azure SQL 数据库或 Microsoft SQL Server 数据库之间进行导入和导出。Learn how to use Apache Sqoop to import and export between an Apache Hadoop cluster in Azure HDInsight and Azure SQL Database or Microsoft SQL Server database. 本文档中的步骤直接从 Hadoop 群集的头节点使用 sqoop 命令。The steps in this document use the sqoop command directly from the headnode of the Hadoop cluster. 可以使用 SSH 连接到头节点并运行本文档中的命令。You use SSH to connect to the head node and run the commands in this document. 本文是在 HDInsight 中将 Apache Sqoop 与 Hadoop 配合使用的续篇。This article is a continuation of Use Apache Sqoop with Hadoop in HDInsight.

先决条件Prerequisites

设置Set up

  1. 使用 ssh 命令连接到群集。Use ssh command to connect to your cluster. 编辑以下命令(将 CLUSTERNAME 替换为群集的名称),然后输入该命令:Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command:

    ssh sshuser@CLUSTERNAME-ssh.azurehdinsight.net
    
  2. 为了便于使用,请设置变量。For ease of use, set variables. PASSWORDMYSQLSERVERMYDATABASE 替换为相关值,然后输入以下命令:Replace PASSWORD, MYSQLSERVER, and MYDATABASE with the relevant values, and then enter the commands below:

    export password='PASSWORD'
    export sqlserver="MYSQLSERVER"
    export database="MYDATABASE"
    
    
    export serverConnect="jdbc:sqlserver://$sqlserver.database.windows.net:1433;user=sqluser;password=$password"
    export serverDbConnect="jdbc:sqlserver://$sqlserver.database.windows.net:1433;user=sqluser;password=$password;database=$database"
    

Sqoop 导出Sqoop export

从 Hive 到 SQL Server。From Hive to SQL Server.

  1. 若要验证 Sqoop 是否可以查看 SQL 数据库,请在打开的 SSH 连接中输入以下命令。To verify that Sqoop can see your SQL Database, enter the command below in your open SSH connection. 此命令会返回数据库列表。This command returns a list of databases.

    sqoop list-databases --connect $serverConnect
    
  2. 输入以下命令以查看指定数据库的表列表:Enter the following command to see a list of tables for the specified database:

    sqoop list-tables --connect $serverDbConnect
    
  3. 若要将数据从 Hive hivesampletable 表导出到 SQL 数据库中的 mobiledata 表,请在打开的 SSH 连接中输入以下命令:To export data from the Hive hivesampletable table to the mobiledata table in SQL Database, enter the command below in your open SSH connection:

    sqoop export --connect $serverDbConnect \
    -table mobiledata \
    --hcatalog-table hivesampletable
    
  4. 若要验证数据是否已导出,请通过 SSH 连接使用以下查询来查看导出的数据:To verify that data was exported, use the following queries from your SSH connection to view the exported data:

    sqoop eval --connect $serverDbConnect \
    --query "SELECT COUNT(*) from dbo.mobiledata WITH (NOLOCK)"
    
    
    sqoop eval --connect $serverDbConnect \
    --query "SELECT TOP(10) * from dbo.mobiledata WITH (NOLOCK)"
    

Sqoop 导入Sqoop import

从 SQL Server 到 Azure 存储。From SQL Server to Azure storage.

  1. 在打开的 SSH 连接中输入以下命令,以将数据从 SQL 数据库中的 mobiledata 表导入到 HDInsight 上的 wasbs:///tutorials/usesqoop/importeddata 目录。Enter the command below in your open SSH connection to import data from the mobiledata table in SQL Database, to the wasbs:///tutorials/usesqoop/importeddata directory on HDInsight. 数据中的字段将通过制表符分隔,并且相关行由换行符终止。The fields in the data are separated by a tab character, and the lines are terminated by a new-line character.

    sqoop import --connect $serverDbConnect \
    --table mobiledata \
    --target-dir 'wasb:///tutorials/usesqoop/importeddata' \
    --fields-terminated-by '\t' \
    --lines-terminated-by '\n' -m 1
    
  2. 此外,还可以指定 Hive 表:Alternatively, you can also specify a Hive table:

    sqoop import --connect $serverDbConnect \
    --table mobiledata \
    --target-dir 'wasb:///tutorials/usesqoop/importeddata2' \
    --fields-terminated-by '\t' \
    --lines-terminated-by '\n' \
    --create-hive-table \
    --hive-table mobiledata_imported2 \
    --hive-import -m 1
    
  3. 在导入完成后,请在打开的 SSH 连接中输入以下命令,以列出新目录中的数据:Once the import has completed, enter the following command in your open SSH connection to list out the data in the new directory:

    hadoop fs -tail /tutorials/usesqoop/importeddata/part-m-00000
    
  4. 使用 beeline 验证该表是否已在 Hive 中创建。Use beeline to verify that the table has been created in Hive.

    1. 连接Connect

      beeline -u 'jdbc:hive2://headnodehost:10001/;transportMode=http'
      
    2. 每次执行下面的一个查询并查看输出:Execute each query below one at a time and review the output:

      show tables;
      describe mobiledata_imported2;
      SELECT COUNT(*) FROM mobiledata_imported2;
      SELECT * FROM mobiledata_imported2 LIMIT 10;
      
    3. 使用 !exit 退出 beeline。Exit beeline with !exit.

限制Limitations

  • 批量导出 - 使用基于 Linux 的 HDInsight,用于将数据导出到 Microsoft SQL Server 或 Azure SQL 数据库的 Sqoop 连接器不支持批量插入。Bulk export - With Linux-based HDInsight, the Sqoop connector used to export data to Microsoft SQL Server or Azure SQL Database doesn't support bulk inserts.

  • 批处理 - 在基于 Linux 的 HDInsight 上,如果在执行插入时使用 -batch 开关,Sqoop 将进行多次插入而不是批处理插入操作。Batching - With Linux-based HDInsight, When using the -batch switch when performing inserts, Sqoop makes multiple inserts instead of batching the insert operations.

重要注意事项Important considerations

后续步骤Next steps

现在,您已经学会了如何使用 Sqoop。Now you've learned how to use Sqoop. 若要了解更多信息,请参阅以下文章:To learn more, see: