Move data to or from Azure Blob Storage using SSIS connectors

The Azure Feature Pack for Integration Services (SSIS) provides components to connect to Azure, transfer data between Azure and on-premises data sources, and process data stored in Azure.

This menu links to technologies you can use to move data to and from Azure Blob storage:

Once customers have moved on-premises data into the cloud, they can access their data from any Azure service to take advantage of the full power of the suite of Azure technologies. The data might be subsequently used, for example, in Azure Machine Learning or on an HDInsight cluster.

Examples for using these Azure resources are in the SQL and HDInsight walkthroughs.

For a discussion of canonical scenarios that use SSIS to accomplish business needs common in hybrid data integration scenarios, see Doing more with SQL Server Integration Services Feature Pack for Azure blog.

Note

For a complete introduction to Azure Blob Storage, refer to Azure Blob basics and to Azure Blob Service REST API.

Prerequisites

To perform the tasks described in this article, you must have an Azure subscription and an Azure Storage account set up. You need the Azure Storage account name and account key to upload or download data.

To use the SSIS connectors, you must download:

Note

SSIS is installed with SQL Server, but is not included in the Express version. For information on what applications are included in various editions of SQL Server, see SQL Server technical documentation

For installing SSIS, see Install Integration Services (SSIS)

For information on how to get up and running using SISS to build simple extraction, transformation, and load (ETL) packages, see SSIS tutorial: Creating a simple ETL package.

Download NYC Taxi dataset

The example described here uses a publicly available dataset, available in Azure open datasets and TLC trip record data. The dataset consists of about 173 million taxi rides in NYC in the year 2013. There are two types of data: trip details data and fare data.

Upload data to Blob Storage

To move data using the SSIS feature pack from on-premises to Blob Storage, we use an instance of the Azure Blob Upload Task, shown here:

Screenshot that shows Azure Blob Upload Task button.

The following table describes the parameters that the task uses.

Field Description
AzureStorageConnection Specifies an existing Azure Storage Connection Manager or creates a new one that refers to an Azure Storage account that points to where the blob files are hosted.
BlobContainer Specifies the name of the blob container that holds the uploaded files as blobs.
BlobDirectory Specifies the blob directory where the uploaded file is stored as a block blob. The blob directory is a virtual hierarchical structure. If the blob already exists, it is replaced.
LocalDirectory Specifies the local directory that contains the files to be uploaded.
FileName Specifies a name filter to select files with the specified name pattern. For example, MySheet*.xls* includes files such as MySheet001.xls and MySheetABC.xlsx
TimeRangeFrom/TimeRangeTo Specifies a time range filter. Files modified after TimeRangeFrom and before TimeRangeTo are included.

Note

The AzureStorageConnection credentials need to be correct and the BlobContainer must exist before the transfer is attempted.

Download data from Blob Storage

To download data from Blob Storage to on-premises storage with SSIS, use an instance of the Azure Blob Download Task.

More advanced SSIS Azure scenarios

The SSIS feature pack allows for more complex flows to be handled by packaging tasks together. For example, the blob data could feed directly into an HDInsight cluster, whose output could be downloaded back to a blob and then to on-premises storage. SSIS can run Hive and Pig jobs on an HDInsight cluster using additional SSIS connectors:

Contributors

This article is maintained by Microsoft. It was originally written by the following contributors.

Principal author:

To see non-public LinkedIn profiles, sign in to LinkedIn.

Next steps