Use Apache Sqoop to import and export data between Apache Hadoop on HDInsight and SQL Database

Learn how to use Apache Sqoop to import and export between an Apache Hadoop cluster in Azure HDInsight and Azure SQL Database or Microsoft SQL Server database. The steps in this document use the sqoop command directly from the headnode of the Hadoop cluster. You use SSH to connect to the head node and run the commands in this document. This article is a continuation of Use Apache Sqoop with Hadoop in HDInsight.

Prerequisites

Sqoop export

From Hive to SQL Server.

  1. Use SSH to connect to the HDInsight cluster. Replace CLUSTERNAME with the name of your cluster, then enter the command:

    ssh sshuser@CLUSTERNAME-ssh.azurehdinsight.net
    
  2. Replace MYSQLSERVER with the name of your SQL Server. To verify that Sqoop can see your SQL Database, enter the command below in your open SSH connection. Enter the password for the SQL Server login when prompted. This command returns a list of databases.

    sqoop list-databases --connect jdbc:sqlserver://MYSQLSERVER.database.windows.net:1433 --username sqluser -P
    
  3. Replace MYSQLSERVER with the name of your SQL Server, and MYDATABASE with the name of your SQL database. To export data from the Hive hivesampletable table to the mobiledata table in SQL Database, enter the command below in your open SSH connection. Enter the password for the SQL Server login when prompted

    sqoop export --connect 'jdbc:sqlserver://MYSQLSERVER.database.windows.net:1433;database=MYDATABASE' --username sqluser -P -table 'mobiledata' --hcatalog-table hivesampletable
    
  4. To verify that data was exported, use the following queries from your SQL client to view the exported data:

    SELECT COUNT(*) FROM [dbo].[mobiledata] WITH (NOLOCK);
    SELECT TOP(25) * FROM [dbo].[mobiledata] WITH (NOLOCK);
    

Sqoop import

From SQL Server to Azure storage.

  1. Replace MYSQLSERVER with the name of your SQL Server, and MYDATABASE with the name of your SQL database. Enter the command below in your open SSH connection to import data from the mobiledata table in SQL Database, to the wasb:///tutorials/usesqoop/importeddata directory on HDInsight. Enter the password for the SQL Server login when prompted. The fields in the data are separated by a tab character, and the lines are terminated by a new-line character.

    sqoop import --connect 'jdbc:sqlserver://MYSQLSERVER.database.windows.net:1433;database=MYDATABASE' --username sqluser -P --table 'mobiledata' --target-dir 'wasb:///tutorials/usesqoop/importeddata' --fields-terminated-by '\t' --lines-terminated-by '\n' -m 1
    
  2. Once the import has completed, enter the following command in your open SSH connection to list out the data in the new directory:

    hdfs dfs -text /tutorials/usesqoop/importeddata/part-m-00000
    

Limitations

  • Bulk export - With Linux-based HDInsight, the Sqoop connector used to export data to Microsoft SQL Server or Azure SQL Database does not support bulk inserts.

  • Batching - With Linux-based HDInsight, When using the -batch switch when performing inserts, Sqoop makes multiple inserts instead of batching the insert operations.

Important considerations

Next steps

Now you have learned how to use Sqoop. To learn more, see: