Move data to SQL Server on an Azure virtual machine

This topic outlines the options for moving data either from flat files (CSV or TSV formats) or from an on-premises SQL Server to SQL Server on an Azure virtual machine. These tasks for moving data to the cloud are part of the Team Data Science Process.

For a topic that outlines the options for moving data to an Azure SQL Database for Machine Learning, see Move data to an Azure SQL Database for Azure Machine Learning.

The menu below links to topics that describe how to ingest data into other target environments where the data can be stored and processed during the Team Data Science Process (TDSP).

The following table summarizes the options for moving data to SQL Server on an Azure virtual machine.

SOURCE DESTINATION: SQL Server on Azure VM
Flat File 1. Command-line bulk copy utility (BCP)
2. Bulk Insert SQL Query
3. Graphical Built-in Utilities in SQL Server
On-Premises SQL Server 1. Deploy a SQL Server Database to a Microsoft Azure VM wizard
2. Export to a flat File
3. SQL Database Migration Wizard
4. Database back up and restore

Note that this document assumes that SQL commands are executed from SQL Server Management Studio or Visual Studio Database Explorer.

Tip

As an alternative, you can use Azure Data Factory to create and schedule a pipeline that will move data to a SQL Server VM on Azure. For more information, see Copy data with Azure Data Factory (Copy Activity).

Prerequisites

This tutorial assumes you have:

Moving data from a flat file source to SQL Server on an Azure VM

If your data is in a flat file (arranged in a row/column format), it can be moved to SQL Server VM on Azure via the following methods:

  1. Command-line bulk copy utility (BCP)
  2. Bulk Insert SQL Query
  3. Graphical Built-in Utilities in SQL Server (Import/Export, SSIS)

Command-line bulk copy utility (BCP)

BCP is a command-line utility installed with SQL Server and is one of the quickest ways to move data. It works across all three SQL Server variants (On-premises SQL Server, SQL Azure and SQL Server VM on Azure).

Note

Where should my data be for BCP?
While it is not required, having files containing source data located on the same machine as the target SQL Server allows for faster transfers (network speed vs local disk IO speed). You can move the flat files containing data to the machine where SQL Server is installed using various file copying tools such as AZCopy, Azure Storage Explorer or windows copy/paste via Remote Desktop Protocol (RDP).

  1. Ensure that the database and the tables are created on the target SQL Server database. Here is an example of how to do that using the Create Database and Create Table commands:

     CREATE DATABASE <database_name>
    
     CREATE TABLE <tablename>
     (
         <columnname1> <datatype> <constraint>,
         <columnname2> <datatype> <constraint>,
         <columnname3> <datatype> <constraint>
     )
    
  2. Generate the format file that describes the schema for the table by issuing the following command from the command-line of the machine where bcp is installed.

    bcp dbname..tablename format nul -c -x -f exportformatfilename.xml -S servername\sqlinstance -T -t \t -r \n

  3. Insert the data into the database using the bcp command as follows. This should work from the command-line assuming that the SQL Server is installed on same machine:

    bcp dbname..tablename in datafilename.tsv -f exportformatfilename.xml -S servername\sqlinstancename -U username -P password -b block_size_to_move_in_single_attemp -t \t -r \n

Optimizing BCP Inserts Please refer the following article 'Guidelines for Optimizing Bulk Import' to optimize such inserts.

Parallelizing Inserts for Faster Data Movement

If the data you are moving is large, you can speed things up by simultaneously executing multiple BCP commands in parallel in a PowerShell Script.

Note

Big data Ingestion To optimize data loading for large and very large datasets, partition your logical and physical database tables using multiple filegroups and partition tables. For more information about creating and loading data to partition tables, see Parallel Load SQL Partition Tables.

The sample PowerShell script below demonstrate parallel inserts using bcp:

$NO_OF_PARALLEL_JOBS=2

 Set-ExecutionPolicy RemoteSigned #set execution policy for the script to execute
 # Define what each job does
   $ScriptBlock = {
       param($partitionnumber)

       #Explictly using SQL username password
       bcp database..tablename in datafile_path.csv -F 2 -f format_file_path.xml -U username@servername -S tcp:servername -P password -b block_size_to_move_in_single_attempt -t "," -r \n -o path_to_outputfile.$partitionnumber.txt

        #Trusted connection w.o username password (if you are using windows auth and are signed in with that credentials)
        #bcp database..tablename in datafile_path.csv -o path_to_outputfile.$partitionnumber.txt -h "TABLOCK" -F 2 -f format_file_path.xml  -T -b block_size_to_move_in_single_attempt -t "," -r \n
  }


# Background processing of all partitions
for ($i=1; $i -le $NO_OF_PARALLEL_JOBS; $i++)
{
  Write-Debug "Submit loading partition # $i"
  Start-Job $ScriptBlock -Arg $i      
}


# Wait for it all to complete
While (Get-Job -State "Running")
{
  Start-Sleep 10
  Get-Job
}

# Getting the information back from the jobs
Get-Job | Receive-Job
Set-ExecutionPolicy Restricted #reset the execution policy

Bulk Insert SQL Query

Bulk Insert SQL Query can be used to import data into the database from row/column based files (the supported types are covered in thePrepare Data for Bulk Export or Import (SQL Server)) topic.

Here are some sample commands for Bulk Insert are as below:

  1. Analyze your data and set any custom options before importing to make sure that the SQL Server database assumes the same format for any special fields such as dates. Here is an example of how to set the date format as year-month-day (if your data contains the date in year-month-day format):

     SET DATEFORMAT ymd;    
    
  2. Import data using bulk import statements:

     BULK INSERT <tablename>
     FROM    
     '<datafilename>'
     WITH
     (
     FirstRow=2,
     FIELDTERMINATOR =',', --this should be column separator in your data
     ROWTERMINATOR ='\n'   --this should be the row separator in your data
     )
    

Built-in Utilities in SQL Server

You can use SQL Server Integrations Services (SSIS) to import data into SQL Server VM on Azure from a flat file. SSIS is available in two studio environments. For details, see Integration Services (SSIS) and Studio Environments:

Moving Data from on-premises SQL Server to SQL Server on an Azure VM

You can also use the following migration strategies:

  1. Deploy a SQL Server Database to a Microsoft Azure VM wizard
  2. Export to Flat File
  3. SQL Database Migration Wizard
  4. Database back up and restore

We describe each of these below:

Deploy a SQL Server Database to a Microsoft Azure VM wizard

The Deploy a SQL Server Database to a Microsoft Azure VM wizard is a simple and recommended way to move data from an on-premises SQL Server instance to SQL Server on an Azure VM. For detailed steps as well as a discussion of other alternatives, see Migrate a database to SQL Server on an Azure VM.

Export to Flat File

Various methods can be used to bulk export data from an On-Premises SQL Server as documented in the Bulk Import and Export of Data (SQL Server) topic. This document will cover the Bulk Copy Program (BCP) as an example. Once data is exported into a flat file, it can be imported to another SQL server using bulk import.

  1. Export the data from on-premises SQL Server to a File using the bcp utility as follows

    bcp dbname..tablename out datafile.tsv -S servername\sqlinstancename -T -t \t -t \n -c

  2. Create the database and the table on SQL Server VM on Azure using the create database and create table for the table schema exported in step 1.
  3. Create a format file for describing the table schema of the data being exported/imported. Details of the format file are described in Create a Format File (SQL Server).

    Format file generation when running BCP from the SQL Server machine

     bcp dbname..tablename format nul -c -x -f exportformatfilename.xml -S servername\sqlinstance -T -t \t -r \n
    

    Format file generation when running BCP remotely against a SQL Server

     bcp dbname..tablename format nul -c -x -f  exportformatfilename.xml  -U username@servername.database.windows.net -S tcp:servername -P password  --t \t -r \n
    
  4. Use any of the methods described in section Moving Data from File Source to move the data in flat files to a SQL Server.

SQL Database Migration Wizard

SQL Server Database Migration Wizard provides a user-friendly way to move data between two SQL server instances. It allows the user to map the data schema between sources and destination tables, choose column types and various other functionalities. It uses bulk copy (BCP) under the covers. A screenshot of the welcome screen for the SQL Database Migration wizard is shown below.

SQL Server Migration Wizard

Database back up and restore

SQL Server supports:

  1. Database back up and restore functionality (both to a local file or bacpac export to blob) and Data Tier Applications (using bacpac).
  2. Ability to directly create SQL Server VMs on Azure with a copied database or copy to an existing SQL Azure database. For more details, see Use the Copy Database Wizard.

A screenshot of the Database back up/restore options from SQL Server Management Studio is shown below.

SQL Server Import Tool

Resources

Migrate a Database to SQL Server on an Azure VM

SQL Server on Azure Virtual Machines overview