Migrate a Studio (classic) dataset to Azure Machine Learning

Important

Support for Machine Learning Studio (classic) will end on 31 August 2024. We recommend you transition to Azure Machine Learning by that date.

Beginning 1 December 2021, you will not be able to create new Machine Learning Studio (classic) resources. Through 31 August 2024, you can continue to use the existing Machine Learning Studio (classic) resources.

ML Studio (classic) documentation is being retired and may not be updated in the future.

In this article, you learn how to migrate a Studio (classic) dataset to Azure Machine Learning. For more information on migrating from Studio (classic), see the migration overview article.

You have three options to migrate a dataset to Azure Machine Learning. Read each section to determine which option is best for your scenario.

Where is the data? Migration option
In Studio (classic) Option 1: Download the dataset from Studio (classic) and upload it to Azure Machine Learning.
Cloud storage Option 2: Register a dataset from a cloud source.

Option 3: Use the Import Data module to get data from a cloud source.

Note

Azure Machine Learning also supports code-first workflows for creating and managing datasets.

Prerequisites

Download the dataset from Studio (classic)

The simplest way to migrate a Studio (classic) dataset to Azure Machine Learning is to download your dataset and register it in Azure Machine Learning. This creates a new copy of your dataset and uploads it to an Azure Machine Learning datastore.

You can download the following Studio (classic) dataset types directly.

  • Plain text (.txt)
  • Comma-separated values (CSV) with a header (.csv) or without (.nh.csv)
  • Tab-separated values (TSV) with a header (.tsv) or without (.nh.tsv)
  • Excel file
  • Zip file (.zip)

To download datasets directly:

  1. Go to your Studio (classic) workspace (https://studio.azureml.net).

  2. In the left navigation bar, select the Datasets tab.

  3. Select the dataset(s) you want to download.

  4. In the bottom action bar, select Download.

    Screenshot showing how to download a dataset in Studio (classic)

For the following data types, you must use the Convert to CSV module to download datasets.

  • SVMLight data (.svmlight)
  • Attribute Relation File Format (ARFF) data (.arff)
  • R object or workspace file (.RData)
  • Dataset type (.data). Dataset type is Studio(classic) internal data type for module output.

To convert your dataset to a CSV and download the results:

  1. Go to your Studio (classic) workspace (https://studio.azureml.net).

  2. Create a new experiment.

  3. Drag and drop the dataset you want to download onto the canvas.

  4. Add a Convert to CSV module.

  5. Connect the Convert to CSV input port to the output port of your dataset.

  6. Run the experiment.

  7. Right-click the Convert to CSV module.

  8. Select Results dataset > Download.

    Screenshot showing how to setup a convert to CSV pipeline

Upload your dataset to Azure Machine Learning

After you download the data file, you can register the dataset in Azure Machine Learning:

  1. Go to Azure Machine Learning studio (ml.azure.com).

  2. In the left navigation pane, select the Datasets tab.

  3. Select Create dataset > From local files. Screenshot showing the datasets tab and the button for creating a local file

  4. Enter a name and description.

  5. For Dataset type, select Tabular.

    Note

    You can also upload ZIP files as datasets. To upload a ZIP file, select File for Dataset type.

  6. For Datastore and file selection, select the datastore you want to upload your dataset file to.

    By default, Azure Machine Learning stores the dataset to the default workspace blobstore. For more information on datastores, see Connect to storage services.

  7. Set the data parsing settings and schema for your dataset. Then, confirm your settings.

Import data from cloud sources

If your data is already in a cloud storage service, and you want to keep your data in its native location. You can use either of the following options:

Ingestion method Description
Register an Azure Machine Learning dataset Ingest data from local and online data sources (Blob, ADLS Gen1, ADLS Gen2, File share, SQL DB).

Creates a reference to the data source, which is lazily evaluated at runtime. Use this option if you repeatedly access this dataset and want to enable advanced data features like data versioning and monitoring.
Import Data module Ingest data from online data sources (Blob, ADLS Gen1, ADLS Gen2, File share, SQL DB).

The dataset is only imported to the current designer pipeline run.

Note

Studio (classic) users should note that the following cloud sources are not natively supported in Azure Machine Learning:

  • Hive Query
  • Azure Table
  • Azure Cosmos DB
  • On-premises SQL Database

We recommend that users migrate their data to a supported storage services using Azure Data Factory.

Register an Azure Machine Learning dataset

Use the following steps to register a dataset to Azure Machine Learning from a cloud service:

  1. Create a datastore, which links the cloud storage service to your Azure Machine Learning workspace.

  2. Register a dataset. If you are migrating a Studio (classic) dataset, select the Tabular dataset setting.

After you register a dataset in Azure Machine Learning, you can use it in designer:

  1. Create a new designer pipeline draft.
  2. In the module palette to the left, expand the Datasets section.
  3. Drag your registered dataset onto the canvas.

Use the Import Data module

Use the following steps to import data directly to your designer pipeline:

  1. Create a datastore, which links the cloud storage service to your Azure Machine Learning workspace.

After you create the datastore, you can use the Import Data module in the designer to ingest data from it:

  1. Create a new designer pipeline draft.
  2. In the module palette to the left, find the Import Data module and drag it to the canvas.
  3. Select the Import Data module, and use the settings in the right panel to configure your data source.

Next steps

In this article, you learned how to migrate a Studio (classic) dataset to Azure Machine Learning. The next step is to rebuild a Studio (classic) training pipeline.

See the other articles in the Studio (classic) migration series:

  1. Migration overview.
  2. Migrate datasets.
  3. Rebuild a Studio (classic) training pipeline.
  4. Rebuild a Studio (classic) web service.
  5. Integrate an Azure Machine Learning web service with client apps.
  6. Migrate Execute R Script.