您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

将工作室(经典)数据集迁移到 Azure 机器学习Migrate a Studio (classic) dataset to Azure Machine Learning

本文介绍了如何将工作室(经典)数据集迁移到 Azure 机器学习。In this article, you learn how to migrate a Studio (classic) dataset to Azure Machine Learning. 有关从工作室(经典)迁移的详细信息,请参阅迁移概述一文For more information on migrating from Studio (classic), see the migration overview article.

可通过三个选项将数据集迁移到 Azure 机器学习。You have three options to migrate a dataset to Azure Machine Learning. 阅读每个部分以确定最适合你方案的选项。Read each section to determine which option is best for your scenario.

数据在哪个哪里?Where is the data? 迁移选项Migration option
在工作室(经典)中In Studio (classic) 选项 1:从工作室(经典)下载数据集,并将其上传到 Azure 机器学习Option 1: Download the dataset from Studio (classic) and upload it to Azure Machine Learning.
云存储Cloud storage 选项 2:从云源注册数据集Option 2: Register a dataset from a cloud source.

选项 3:使用“导入数据”模块从云源中获取数据Option 3: Use the Import Data module to get data from a cloud source.

备注

Azure 机器学习还支持代码优先工作集,用于创建和迁移数据集。Azure Machine Learning also supports code-first workflows for creating and managing datasets.

先决条件Prerequisites

从工作室(经典)下载数据集Download the dataset from Studio (classic)

将工作室(经典)数据集迁移到 Azure 机器学习的最简单方法是下载数据集,并将其注册到 Azure 机器学习中。The simplest way to migrate a Studio (classic) dataset to Azure Machine Learning is to download your dataset and register it in Azure Machine Learning. 这会创建数据集的新副本,并将其上传到 Azure 机器学习数据存储。This creates a new copy of your dataset and uploads it to an Azure Machine Learning datastore.

可以直接下载以下工作室(经典)数据集类型。You can download the following Studio (classic) dataset types directly.

  • 纯文本 (.txt)Plain text (.txt)
  • 逗号分隔值 (CSV),带有标头 (.csv) 或不带标头 (.nh.csv)Comma-separated values (CSV) with a header (.csv) or without (.nh.csv)
  • 制表符分隔值 (TSV),带有标头 (.tsv) 或不带标头 (.nh.tsv)Tab-separated values (TSV) with a header (.tsv) or without (.nh.tsv)
  • Excel 文件Excel file
  • Zip 文件 (.zip)Zip file (.zip)

要直接下载数据集,请执行以下操作:To download datasets directly:

  1. 转到工作室(经典)工作区 (https://studio.azureml.net)。Go to your Studio (classic) workspace (https://studio.azureml.net).

  2. 在左侧导航栏中,选择“数据集”选项卡。In the left navigation bar, select the Datasets tab.

  3. 选择要下载的数据集。Select the dataset(s) you want to download.

  4. 在底部操作栏中,选择“下载”。In the bottom action bar, select Download.

    屏幕截图,显示如何在工作室(经典)中下载数据集

对于以下数据类型,必须使用“转换为 CSV”模块来下载数据集。For the following data types, you must use the Convert to CSV module to download datasets.

  • SVMLight 数据 (.svmlight)SVMLight data (.svmlight)
  • 属性关系文件格式 (ARFF) 数据 (.arff)Attribute Relation File Format (ARFF) data (.arff)
  • R 对象或工作区文件 (.RData)R object or workspace file (.RData)
  • 数据集类型 (.data)。Dataset type (.data). 数据集类型为工作室(经典)模块输出的内部数据类型。Dataset type is Studio(classic) internal data type for module output.

若要将数据集转换为 CSV 并下载结果,请执行以下操作:To convert your dataset to a CSV and download the results:

  1. 转到工作室(经典)工作区 (https://studio.azureml.net)。Go to your Studio (classic) workspace (https://studio.azureml.net).

  2. 创建新实验。Create a new experiment.

  3. 将要下载的数据集拖放到画布上。Drag and drop the dataset you want to download onto the canvas.

  4. 添加“转换为 CSV’模块。Add a Convert to CSV module.

  5. 将“转换为 CSV”输入端口连接到数据集的输出端口。Connect the Convert to CSV input port to the output port of your dataset.

  6. 运行试验。Run the experiment.

  7. 右键单击“转换为 CSV”模块。Right-click the Convert to CSV module.

  8. 选择“结果数据集” > “下载”。Select Results dataset > Download.

    屏幕截图,显示如何设置转换为 CSV 管道

将数据集上传到 Azure 机器学习Upload your dataset to Azure Machine Learning

下载数据文件后,可以在 Azure 机器学习中注册数据集:After you download the data file, you can register the dataset in Azure Machine Learning:

  1. 转到 Azure 机器学习工作室 (ml.azure.com)。Go to Azure Machine Learning studio (ml.azure.com).

  2. 在左侧导航窗格中,选择“数据集”选项卡。In the left navigation pane, select the Datasets tab.

  3. 选择“创建数据集” > “从本地文件”。Select Create dataset > From local files. 屏幕截图,显示数据集选项卡和创建本地文件的按钮Screenshot showing the datasets tab and the button for creating a local file

  4. 输入名称和说明。Enter a name and description.

  5. 对于“数据集类型”,选择“表格”。For Dataset type, select Tabular.

    备注

    还可以上传 ZIP 文件作为数据集。You can also upload ZIP files as datasets. 若要上传 ZIP 文件,请选择“文件”作为“数据集类型”。To upload a ZIP file, select File for Dataset type.

  6. 对于数据存储和文件选择,选择要将数据集文件上传到的数据存储。For Datastore and file selection, select the datastore you want to upload your dataset file to.

    默认情况下,Azure 机器学习将数据集存储到默认工作区 blobstore。By default, Azure Machine Learning stores the dataset to the default workspace blobstore. 有关数据存储的详细信息,请参阅连接到存储服务For more information on datastores, see Connect to storage services.

  7. 为数据集设置数据解析设置和架构。Set the data parsing settings and schema for your dataset. 然后确认设置。Then, confirm your settings.

从云源导入数据Import data from cloud sources

如果数据已在云存储服务中,并且你想要将数据保存在其本机位置。If your data is already in a cloud storage service, and you want to keep your data in its native location. 可以使用两个选项中的一个:You can use either of the following options:

引入方法Ingestion method 说明Description
注册 Azure 机器学习数据集Register an Azure Machine Learning dataset 从本地和联机数据源(Blob、ADLS Gen1、ADLS Gen2、文件共享、SQL DB)中引入数据。Ingest data from local and online data sources (Blob, ADLS Gen1, ADLS Gen2, File share, SQL DB).

创建对数据源的引用,该数据源在运行时延迟计算。Creates a reference to the data source, which is lazily evaluated at runtime. 如果重复访问此数据集,并希望启用数据版本控制和监视等高级数据功能,请使用此选项。Use this option if you repeatedly access this dataset and want to enable advanced data features like data versioning and monitoring.
“导入数据”模块Import Data module 从联机数据源(Blob、ADLS Gen1、ADLS Gen2、文件共享、SQL DB)中引入数据。Ingest data from online data sources (Blob, ADLS Gen1, ADLS Gen2, File share, SQL DB).

数据集仅导入到当前设计器管道运行中。The dataset is only imported to the current designer pipeline run.

备注

工作室(经典)用户应注意,在 Azure 机器学习中本机不支持以下云源:Studio (classic) users should note that the following cloud sources are not natively supported in Azure Machine Learning:

  • Hive 查询Hive Query
  • Azure 表Azure Table
  • Azure Cosmos DBAzure Cosmos DB
  • 本地 SQL 数据库On-premises SQL Database

建议用户使用 Azure 数据工厂将其数据迁移到支持的存储服务。We recommend that users migrate their data to a supported storage services using Azure Data Factory.

注册 Azure 机器学习数据集Register an Azure Machine Learning dataset

使用以下步骤将数据集从云服务注册到 Azure 机器学习:Use the following steps to register a dataset to Azure Machine Learning from a cloud service:

  1. 创建一个数据存储,用于将云存储服务链接到 Azure 机器学习工作区。Create a datastore, which links the cloud storage service to your Azure Machine Learning workspace.

  2. 注册数据集Register a dataset. 如果要迁移工作室(经典)数据集,请选择“表格”数据集设置。If you are migrating a Studio (classic) dataset, select the Tabular dataset setting.

在 Azure 机器学习中注册数据集后,可以在设计器中使用它:After you register a dataset in Azure Machine Learning, you can use it in designer:

  1. 创建新的设计器管道草稿。Create a new designer pipeline draft.
  2. 在左侧模块面板中,展开“数据集”部分。In the module palette to the left, expand the Datasets section.
  3. 将已注册的数据集拖动到画布上。Drag your registered dataset onto the canvas.

使用“导入数据”模块Use the Import Data module

使用以下步骤将数据直接导入设计器管道:Use the following steps to import data directly to your designer pipeline:

  1. 创建一个数据存储,用于将云存储服务链接到 Azure 机器学习工作区。Create a datastore, which links the cloud storage service to your Azure Machine Learning workspace.

创建数据存储后,可以使用设计器中的“导入数据”模块来从中引入数据:After you create the datastore, you can use the Import Data module in the designer to ingest data from it:

  1. 创建新的设计器管道草稿。Create a new designer pipeline draft.
  2. 在左侧模块面板中,找到“导入数据”模块并将其拖动到画布上。In the module palette to the left, find the Import Data module and drag it to the canvas.
  3. 选择“导入数据”模块,并使用右侧面板中的“设置”来配置数据源。Select the Import Data module, and use the settings in the right panel to configure your data source.

后续步骤Next steps

本文介绍了如何将工作室(经典)数据集迁移到 Azure 机器学习。In this article, you learned how to migrate a Studio (classic) dataset to Azure Machine Learning. 下一步是重新生成工作室(经典)训练管道The next step is to rebuild a Studio (classic) training pipeline.

请参阅工作室(经典)迁移系列中的其他文章:See the other articles in the Studio (classic) migration series:

  1. 迁移概述Migration overview.
  2. 迁移数据集Migrate datasets.
  3. 重新生成工作室(经典)训练管道Rebuild a Studio (classic) training pipeline.
  4. 重新生成工作室(经典)Web 服务Rebuild a Studio (classic) web service.
  5. 将 Azure 机器学习 Web 服务与客户端应用集成Integrate an Azure Machine Learning web service with client apps.
  6. 迁移执行 R 脚本Migrate Execute R Script.