Move data to or from Azure Blob Storage using SSIS connectors

The SQL Server Integration Services Feature Pack for Azure provides components to connect to Azure, transfer data between Azure and on-premises data sources, and process data stored in Azure.

This menu links to technologies you can use to move data to and from Azure Blob storage:

Once customers have moved on-premises data into the cloud, they can access it from any Azure service to leverage the full power of the suite of Azure technologies. It may be used, for example, in Azure Machine Learning or on an HDInsight cluster.

This is typically be the first step for the SQL and HDInsight walkthroughs.

For a discussion of canonical scenarios that use SSIS to accomplish business needs common in hybrid data integration scenarios, see Doing more with SQL Server Integration Services Feature Pack for Azure blog.

Note

For a complete introduction to Azure blob storage, refer to Azure Blob Basics and to Azure Blob Service.

Prerequisites

To perform the tasks described in this article, you must have an Azure subscription and an Azure storage account set up. You must know your Azure storage account name and account key to upload or download data.

To use the SSIS connectors, you must download:

Note

SSIS is installed with SQL Server, but is not included in the Express version. For information on what applications are included in various editions of SQL Server, see SQL Server Editions

For training materials on SSIS, see Hands On Training for SSIS

For information on how to get up-and-running using SISS to build simple extraction, transformation, and load (ETL) packages, see SSIS Tutorial: Creating a Simple ETL Package.

Download NYC Taxi dataset

The example described here use a publicly available dataset -- the NYC Taxi Trips dataset. The dataset consists of about 173 million taxi rides in NYC in the year 2013. There are two types of data: trip details data and fare data. As there is a file for each month, we have 24 files in all, each of which is approximately 2GB uncompressed.

Upload data to Azure blob storage

To move data using the SSIS feature pack from on-premises to Azure blob storage, we use an instance of the Azure Blob Upload Task, shown here:

configure-data-science-vm

The parameters that the task uses are described here:

Field Description
AzureStorageConnection Specifies an existing Azure Storage Connection Manager or creates a new one that refers to an Azure storage account that points to where the blob files are hosted.
BlobContainer Specifies the name of the blob container that hold the uploaded files as blobs.
BlobDirectory Specifies the blob directory where the uploaded file is stored as a block blob. The blob directory is a virtual hierarchical structure. If the blob already exists, it ia replaced.
LocalDirectory Specifies the local directory that contains the files to be uploaded.
FileName Specifies a name filter to select files with the specified name pattern. For example, MySheet*.xls* includes files such as MySheet001.xls and MySheetABC.xlsx
TimeRangeFrom/TimeRangeTo Specifies a time range filter. Files modified after TimeRangeFrom and before TimeRangeTo are included.
Note

The AzureStorageConnection credentials need to be correct and the BlobContainer must exist before the transfer is attempted.

Download data from Azure blob storage

To download data from Azure blob storage to on-premises storage with SSIS, use an instance of the Azure Blob Upload Task.

More advanced SSIS-Azure scenarios

The SSIS feature pack allows for more complex flows to be handled by packaging tasks together. For example, the blob data could feed directly into an HDInsight cluster, whose output could be downloaded back to a blob and then to on-premises storage. SSIS can run Hive and Pig jobs on an HDInsight cluster using additional SSIS connectors: