Writes a dataset to various forms of cloud-based storage in Azure, such as tables, blobs, and Azure SQL databases
Category: Data Input and Output
Applies to: Machine Learning Studio
This content pertains only to Studio. Similar drag and drop modules have been added to the visual interface in Machine Learning service. Learn more in this article comparing the two versions.
This article describes how to use the Export Data module in Azure Machine Learning Studio, to save results, intermediate data, and working data from your experiments into cloud storage destinations outside Azure Machine Learning Studio.
This module supports exporting or saving your data to the following cloud data services:
Export to Hive Query: Write data to a Hive table in an HDInsight Hadoop cluster.
Export to Azure SQL Database: Save data to Azure SQL Database or to Azure SQL Data Warehouse.
Export to Azure Table: Save data to the table storage service in Azure. Table storage is good for storing large amounts of data. It provides a tabular format that is scalable, inexpensive, and highly available.
Export to Azure Blob Storage: Saves data to the Blob service in Azure. This option is useful for images, unstructured text, or binary data. Data in the Blob service can be shared publicly or saved in secured application data stores.
Download data: To download your data so that you can open it in Excel or another application, use a module such as Convert to CSV or Convert to TSV to prepare the data in a particular format, and then download the data.
You can download the results of any module that outputs a dataset by right-clicking the output and selecting Download dataset. By default, the data is exported in CSV format.
Download a module definition or experiment graph: A new PowerShell library lets you download the complete metadata for your experiment, or the details for a particular module. The PowerShell for Azure Machine Learning library is an experimental release, but has many useful cmdlets:
Get-AmlExperimentlists all the experiments in a workspace.
Export-AmlExperimentGraphexports a definition of the complete experiment to a JSON file.
Download-AmlExperimentNodeOutputlets you extract the information provided on the output ports of any module.
For more information, see PowerShell Module for Azure Machine Learning Studio.
How to configure Export Data
Add the Export Data module to your experiment in Studio. You can find this module in the Input and Output category.
Connect Export Data to the module that contain the data you want to export.
Double-click Export Data to open the Properties pane.
For Data destination, select the type of cloud storage where you'll save your data. If you make any changes to this option, all other properties are reset. So be sure to choose this option first!
Provide an account name and authentication method required to access the specified storage account.
Depending on the storage type and whether the account is secured, you might need to provide the account name, file type, access key, or container name. For sources that do not require authentication, generally it is sufficient to know the URL.
For examples of each type, see the following topics:
The option, Use cached results, lets you repeat the experiment without rewriting the same results each time.
If you deselect this option, results are written to storage each time the experiment is run, regardless of whether the output data has changed.
If you select this option, Export Data uses cached data, if available. New results are generated only when there is an upstream change that would affect the results.
Run the experiment.
Retail Forecasting Step 1 of 6 - data preprocessing: The retail forecasting template illustrates a machine learning task based on data stored in Azure SQL Database. It demonstrates several useful techniques such as how to create an Azure SQL database for machine learning, using the Azure SQL database to pass datasets between experiments in different accounts, saving and combining forecasts.
Build and deploy a machine learning model using SQL Server on an Azure VM: This article demonstrates how you can use a SQL Server database hosted in an Azure VM as a source for storing training data and the predictions generated by the experiment. It also illustrates how relational database can be used for feature engineering and feature selection.
How to use Azure ML with Azure SQL Data Warehouse: This article shows how you can create a machine learning model using data in Azure SQL Data Warehouse.
This section contains implementation details, tips, and answers to frequently asked questions.
Not sure how or where you should store your data? See this guide to common data scenarios in the data science process: Scenarios for advanced analytics in Azure Machine Learning
This module was previously named Writer. If you have an existing experiment that uses the Writer module, the module is renamed to Export Data when you refresh the experiment.
Not all modules produce output that is compatible with Export Data destinations. For example, Export Data cannot save a dataset that has been converted to the SVMLight format. Export Data supports these formats:
- Dataset (Azure ML internal format)
- .NET DataTable
- CSV with or without headers
- TSV with or without headers
When you select Azure Table as the location to output your data, occasionally there might be an error when writing to the specified table. When this happens, the data might be written to a blob instead.
If this error happens and later you are unable to read from the expected table, try using an Azure storage utility to check the blobs in the specified container in your storage account.
Currently, you cannot save a blob into a specified Hive table. If you need to write intermediate results, avoid using a Hive table in HDInsight, and use blob storage or table storage instead.
Currently, if you select HDFS as the location to save output data, this error message is returned: “Microsoft.Analytics.Exceptions.ErrorMapping+ModuleException.”
|Dataset||Data Table||The dataset to be written.|
This table lists parameters that apply to all Export Data options. Other parameters are dynamic and change depending on the data destination you select.
|Please specify data destination||List||DataSourceOrSink||Blob service in Azure Storage||Indicate whether the data destination is a file in the Blob service, a file in the Table service, a SQL database in Azure, or a Hive table.|
|Use cached results||TRUE/FALSE||Boolean||FALSE||Select this option to avoid rewriting results unnecessarily. If anything changes upstream in the experiment, Export Data will always execute and write new results. However if nothing has changed, and you have selected this option, Export Data will not execute in order to avoid rewriting the same results.|
|Error 0057||An exception occurs when attempting to create a file or blob that already exists.|
|Error 0001||An exception occurs if one or more specified columns of the dataset couldn't be found.|
|Error 0027||An exception occurs when two objects have to be of the same size, but they are not.|
|Error 0079||An exception occurs if the container name in Azure Storage is specified incorrectly.|
|Error 0052||An exception occurs if the storage access key for the Azure account is specified incorrectly.|
|Error 0064||An exception occurs if account name or storage access key for the Azure account is specified incorrectly.|
|Error 0071||An exception occurs if the provided credentials are incorrect.|
|Error 0018||An exception occurs if the input dataset is not valid.|
|Error 0029||An exception occurs when an invalid URI is passed.|
|Error 0003||An exception occurs if one or more inputs are null or empty.|
For a list of errors specific to Studio modules, see Machine Learning Error codes
For a list of API exceptions, see Machine Learning REST API Error Codes.