Move data to an Azure SQL Database for Azure Machine Learning
This topic outlines the options for moving data either from flat files (CSV or TSV formats) or from data stored in an on-premises SQL Server to an Azure SQL database. These tasks for moving data to the cloud are part of the Team Data Science Process.
For a topic that outlines the options for moving data to an on-premises SQL Server for Machine Learning, see Move data to SQL Server on an Azure virtual machine.
The following menu links to topics that describe how to ingest data into target environments where the data can be stored and processed during the Team Data Science Process (TDSP).
The following table summarizes the options for moving data to an Azure SQL Database.
|SOURCE||DESTINATION: Azure SQL Database|
|Flat file (CSV or TSV formatted)||Bulk Insert SQL Query|
|On-premises SQL Server||1. Export to Flat File
2. SQL Database Migration Wizard
3. Database back up and restore
4. Azure Data Factory
The procedures outlined here require that you have:
- An Azure subscription. If you do not have a subscription, you can sign up for a free trial.
- An Azure storage account. You use an Azure storage account for storing the data in this tutorial. If you don't have an Azure storage account, see the Create a storage account article. After you have created the storage account, you need to obtain the account key used to access the storage. See Manage your storage access keys.
- Access to an Azure SQL Database. If you must set up an Azure SQL Database, Getting Started with Microsoft Azure SQL Database provides information on how to provision a new instance of an Azure SQL Database.
- Installed and configured Azure PowerShell locally. For instructions, see How to install and configure Azure PowerShell.
Data: The migration processes are demonstrated using the NYC Taxi dataset. The NYC Taxi dataset contains information on trip data and fairs and is available on Azure blob storage: NYC Taxi Data. A sample and description of these files are provided in NYC Taxi Trips Dataset Description.
You can either adapt the procedures described here to a set of your own data or follow the steps as described by using the NYC Taxi dataset. To upload the NYC Taxi dataset into your on-premises SQL Server database, follow the procedure outlined in Bulk Import Data into SQL Server Database. These instructions are for a SQL Server on an Azure Virtual Machine, but the procedure for uploading to the on-premises SQL Server is the same.
Moving data from a flat file source to an Azure SQL database
Data in flat files (CSV or TSV formatted) can be moved to an Azure SQL database using a Bulk Insert SQL Query.
Bulk Insert SQL Query
The steps for the procedure using the Bulk Insert SQL Query are similar to those covered in the sections for moving data from a flat file source to SQL Server on an Azure VM. For details, see Bulk Insert SQL Query.
Moving Data from on-premises SQL Server to an Azure SQL database
If the source data is stored in an on-premises SQL Server, there are various possibilities for moving the data to an Azure SQL database:
The steps for the first three are very similar to those sections in Move data to SQL Server on an Azure virtual machine that cover these same procedures. Links to the appropriate sections in that topic are provided in the following instructions.
Export to Flat File
The steps for this exporting to a flat file are similar to those covered in Export to Flat File.
SQL Database Migration Wizard
The steps for using the SQL Database Migration Wizard are similar to those covered in SQL Database Migration Wizard.
Database back up and restore
The steps for using database back up and restore are similar to those covered in Database back up and restore.
Azure Data Factory
The procedure for moving data to an Azure SQL database with Azure Data Factory (ADF) is provided in the topic Move data from an on-premises SQL server to SQL Azure with Azure Data Factory. This topic shows how to move data from an on-premises SQL Server database to an Azure SQL database via Azure Blob Storage using ADF.
Consider using ADF when data needs to be continually migrated in a hybrid scenario that accesses both on-premises and cloud resources, and when the data is transacted or needs to be modified or have business logic added to it when being migrated. ADF allows for the scheduling and monitoring of jobs using simple JSON scripts that manage the movement of data on a periodic basis. ADF also has other capabilities such as support for complex operations.