Migrate a Studio (classic) dataset to Azure Machine Learning

Important

Support for Azure Machine Learning Studio (classic) will end on August 31, 2024. We recommend that you transition to Azure Machine Learning by that date.

As of December 1, 2021, you can't create new Machine Learning Studio (classic) resources (workspace and web service plan). Through August 31, 2024, you can continue to use the existing Machine Learning Studio (classic) experiments and web services. For more information, see:

Machine Learning Studio (classic) documentation is being retired and might not be updated in the future.

In this article, you learn how to migrate a Studio (classic) dataset to Azure Machine Learning. For more information on migrating from Studio (classic), see the migration overview article.

You have three options to migrate a dataset to Azure Machine Learning. Read each section to determine which option is best for your scenario.

Where is the data? Migration option
In Studio (classic) Option 1: Download the dataset from Studio (classic) and upload it to Azure Machine Learning.
Cloud storage Option 2: Register a dataset from a cloud source.

Option 3: Use the Import Data module to get data from a cloud source.

Note

Azure Machine Learning also supports code-first workflows for creating and managing datasets.

Prerequisites

Download the dataset from Studio (classic)

The simplest way to migrate a Studio (classic) dataset to Azure Machine Learning is to download your dataset and register it in Azure Machine Learning. This creates a new copy of your dataset and uploads it to an Azure Machine Learning datastore.

You can download the following Studio (classic) dataset types directly.

  • Plain text (.txt)
  • Comma-separated values (CSV) with a header (.csv) or without (.nh.csv)
  • Tab-separated values (TSV) with a header (.tsv) or without (.nh.tsv)
  • Excel file
  • Zip file (.zip)

To download datasets directly:

  1. Go to your Studio (classic) workspace (https://studio.azureml.net).

  2. In the left navigation bar, select the Datasets tab.

  3. Select the dataset(s) you want to download.

  4. In the bottom action bar, select Download.

    AScreenshot showing how to download a dataset in Studio (classic).

For the following data types, you must use the Convert to CSV module to download datasets.

  • SVMLight data (.svmlight)
  • Attribute Relation File Format (ARFF) data (.arff)
  • R object or workspace file (.RData)
  • Dataset type (.data). Dataset type is Studio(classic) internal data type for module output.

To convert your dataset to a CSV and download the results:

  1. Go to your Studio (classic) workspace (https://studio.azureml.net).

  2. Create a new experiment.

  3. Drag and drop the dataset you want to download onto the canvas.

  4. Add a Convert to CSV module.

  5. Connect the Convert to CSV input port to the output port of your dataset.

  6. Run the experiment.

  7. Right-click the Convert to CSV module.

  8. Select Results dataset > Download.

    Screenshot showing how to setup a convert to CSV pipeline.

Upload your dataset to Azure Machine Learning

After you download the data file, you can register it as a data asset in Azure Machine Learning:

  1. Navigate to Azure Machine Learning studio

  2. Under Assets in the left navigation, select Data. On the Data assets tab, select Create Screenshot highlights Create in the Data assets tab.

  3. Give your data asset a name and optional description. Then, select the Tabular option under Type, in the Dataset types section of the dropdown.

    Note

    You can also upload ZIP files as data assets. To upload a ZIP file, select File for Type, in the Dataset types section of the dropdown. Screenshot shows data asset source choices.

  4. For data source, select the "From local files" option to upload your dataset.

  5. For file selection, first choose where you want your data to be stored in Azure. You will be selecting an Azure Machine Learning datastore. For more information on datastores, see Connect to storage services. Next, upload the dataset you downloaded earlier.

  6. Follow the steps to set the data parsing settings and schema for your data asset.

  7. Once you reach the Review step, click Create on the last page

Import data from cloud sources

If your data is already in a cloud storage service, and you want to keep your data in its native location. You can use either of the following options:

Ingestion method Description
Register an Azure Machine Learning dataset Ingest data from local and online data sources (Blob, ADLS Gen1, ADLS Gen2, File share, SQL DB).

Creates a reference to the data source, which is lazily evaluated at runtime. Use this option if you repeatedly access this dataset and want to enable advanced data features like data versioning and monitoring.
Import Data module Ingest data from online data sources (Blob, ADLS Gen1, ADLS Gen2, File share, SQL DB).

The dataset is only imported to the current designer pipeline run.

Note

Studio (classic) users should note that the following cloud sources are not natively supported in Azure Machine Learning:

  • Hive Query
  • Azure Table
  • Azure Cosmos DB
  • On-premises SQL Database

We recommend that users migrate their data to a supported storage services using Azure Data Factory.

Register an Azure Machine Learning dataset

Use the following steps to register a dataset to Azure Machine Learning from a cloud service:

  1. Create a datastore, which links the cloud storage service to your Azure Machine Learning workspace.

  2. Register a dataset. If you are migrating a Studio (classic) dataset, select the Tabular dataset setting.

After you register a dataset in Azure Machine Learning, you can use it in designer:

  1. Create a new designer pipeline draft.
  2. In the module palette to the left, expand the Datasets section.
  3. Drag your registered dataset onto the canvas.

Use the Import Data module

Use the following steps to import data directly to your designer pipeline:

  1. Create a datastore, which links the cloud storage service to your Azure Machine Learning workspace.

After you create the datastore, you can use the Import Data module in the designer to ingest data from it:

  1. Create a new designer pipeline draft.
  2. In the module palette to the left, find the Import Data module and drag it to the canvas.
  3. Select the Import Data module, and use the settings in the right panel to configure your data source.

Next steps

In this article, you learned how to migrate a Studio (classic) dataset to Azure Machine Learning. The next step is to rebuild a Studio (classic) training pipeline.

See the other articles in the Studio (classic) migration series:

  1. Migration overview.
  2. Migrate datasets.
  3. Rebuild a Studio (classic) training pipeline.
  4. Rebuild a Studio (classic) web service.
  5. Integrate an Azure Machine Learning web service with client apps.
  6. Migrate Execute R Script.