您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

教程:使用自动化机器学习创建第一个分类模型Tutorial: Create your first classification model with automated machine learning

应用于:否基本版是 Enterprise 版本            (升级到 EnterpriseAPPLIES TO: noBasic edition yesEnterprise edition                       (Upgrade to Enterprise)

本教程介绍如何在不编写任何代码行的情况下,通过 Azure 机器学习工作室创建第一个自动化机器学习试验。In this tutorial, you learn how to create your first automated machine learning experiment through Azure Machine Learning studio without writing a single line of code. 本示例将创建一个分类模型来预测某家金融机构的客户是否会认购定期存款产品。This example creates a classification model to predict if a client will subscribe to a fixed term deposit with a financial institution.

利用自动机器学习,可以自动完成耗时的任务。With automated machine learning, you can automate away time intensive tasks. 自动机器学习会快速循环访问算法和超参数的多个组合,以帮助你根据所选的成功指标找到最佳模型。Automated machine learning rapidly iterates over many combinations of algorithms and hyperparameters to help you find the best model based on a success metric of your choosing.

本教程介绍如何执行以下任务:In this tutorial, you learn how to do the following tasks:

  • 创建 Azure 机器学习工作区。Create an Azure Machine Learning workspace.
  • 运行自动机器学习试验。Run an automated machine learning experiment.
  • 查看试验详细信息。View experiment details.
  • 部署模型。Deploy the model.

先决条件Prerequisites

  • Azure 订阅。An Azure subscription. 如果没有 Azure 订阅,请创建一个免费帐户If you don’t have an Azure subscription, create a free account.

  • 下载 bankmarketing_train.csv 数据文件。Download the bankmarketing_train.csv data file. y 列指示客户是否认购了定期存款产品,该列稍后在本教程中将标识为预测目标列。The y column indicates if a customer subscribed to a fixed term deposit, which is later identified as the target column for predictions in this tutorial.

创建工作区Create a workspace

Azure 机器学习工作区是云中的基础资源,用于试验、训练和部署机器学习模型。An Azure Machine Learning workspace is a foundational resource in the cloud that you use to experiment, train, and deploy machine learning models. 它将 Azure 订阅和资源组关联到服务中一个易于使用的对象。It ties your Azure subscription and resource group to an easily consumed object in the service.

通过 Azure 机器学习工作室创建工作区,该工作室是用于管理 Azure 资源的基于 Web 的控制台。You create a workspace via the Azure Machine Learning studio, a web-based console for managing your Azure resources.

  1. 使用 Azure 订阅的凭据登录到 Azure 门户Sign in to the Azure portal by using the credentials for your Azure subscription.

  2. 在 Azure 门户的左上角,选择“+ 创建资源” 。In the upper-left corner of Azure portal, select + Create a resource.

    创建新资源

  3. 使用搜索栏查找“机器学习” 。Use the search bar to find Machine Learning.

  4. 选择“机器学习”** **。Select Machine Learning.

  5. 在“机器学习”窗格中,选择“创建”以开始 。In the Machine Learning pane, select Create to begin.

  6. 提供以下信息来配置新工作区:Provide the following information to configure your new workspace:

    字段Field 说明Description
    工作区名称Workspace name 输入用于标识工作区的唯一名称。Enter a unique name that identifies your workspace. 本示例使用 docs-ws 。In this example, we use docs-ws. 名称在整个资源组中必须唯一。Names must be unique across the resource group. 使用易于记忆且区别于其他人所创建工作区的名称。Use a name that's easy to recall and to differentiate from workspaces created by others.
    SubscriptionSubscription 选择要使用的 Azure 订阅。Select the Azure subscription that you want to use.
    Resource groupResource group 使用订阅中的现有资源组,或者输入一个名称以创建新的资源组。Use an existing resource group in your subscription or enter a name to create a new resource group. 资源组保存 Azure 解决方案的相关资源。A resource group holds related resources for an Azure solution. 本示例使用 docs-aml 。In this example, we use docs-aml.
    位置Location 选择离你的用户和数据资源最近的位置来创建工作区。Select the location closest to your users and the data resources to create your workspace.
    工作区版本Workspace edition 选择“Enterprise” 。Select Enterprise. 本教程需要使用 Enterprise 版本。This tutorial requires the use of the Enterprise edition. Enterprise 版本处于预览阶段,目前不会增加任何额外成本。The Enterprise edition is in preview and does not currently add any extra costs.
  7. 完成工作区配置后,选择“创建” 。After you are finished configuring the workspace, select Create.

    警告

    在云中创建工作区可能需要几分钟时间。It can take several minutes to create your workspace in the cloud.

    完成创建后,会显示部署成功消息。When the process is finished, a deployment success message appears.

  8. 若要查看新工作区,请选择“转到资源” 。To view the new workspace, select Go to resource.

重要

记下你的工作区和订阅 。Take note of your workspace and subscription. 你将需要这些项才能确保在正确的位置创建试验。You'll need these to ensure you create your experiment in the right place.

创建并运行试验Create and run the experiment

在 Azure 机器学习工作室中完成以下试验设置和运行步骤,该工作室是包含用于为所有技能级别的数据科学实践者执行数据科学方案的机器学习工具的合并界面。You complete the following experiment set-up and run steps in Azure Machine Learning studio, a consolidated interface that includes machine learning tools to perform data science scenarios for data science practitioners of all skill levels. Internet Explorer 浏览器不支持此工作室。The studio is not supported on Internet Explorer browsers.

  1. 登录到 Azure 机器学习工作室Sign in to Azure Machine Learning studio.

  2. 选择创建的订阅和工作区。Select your subscription and the workspace you created.

  3. 选择“开始”。 Select Get started.

  4. 在左窗格的“创作”部分,选择“自动化 ML” 。In the left pane, select Automated ML under the Author section.

    由于这是你的第一个自动化 ML 试验,因此会看到空列表和文档链接。Since this is your first automated ML experiment, you'll see an empty list and links to documentation.

    Azure 机器学习工作室

  5. 选择“新的自动化 ML 运行” 。Select New automated ML run.

  6. 通过从“+ 创建数据集” 下拉菜单选择“从本地文件” ,创建新的数据集。Create a new dataset by selecting From local files from the +Create dataset drop-down.

    1. 选择“浏览” 。Select Browse.

    2. 选择本地计算机上的 bankmarketing_train.csv 文件 。Choose the bankmarketing_train.csv file on your local computer. 这是作为必备组件下载的文件。This is the file you downloaded as a prerequisite.

    3. 选择“表格” 作为数据集类型。Select Tabular as your dataset type.

    4. 为数据集指定唯一名称,并提供可选说明。Give your dataset a unique name and provide an optional description.

    5. 在底部左侧选择“下一步”,将其上传到在创建工作区期间自动设置的默认容器 。Select Next on the bottom left, to upload it to the default container that was automatically set up during your workspace creation.

      上传完成后,系统会根据文件类型预先填充“设置和预览”窗体。When the upload is complete, the Settings and preview form is pre-populated based on the file type.

    6. 验证“设置和预览”窗体是否已填充如下,然后选择“下一步”。 Verify that the Settings and preview form is populated as follows and select Next.

      字段Field 说明Description 教程的值Value for tutorial
      文件格式File format 定义文件中存储的数据的布局和类型。Defines the layout and type of data stored in a file. 带分隔符Delimited
      分隔符Delimiter 一个或多个字符,用于指定纯文本或其他数据流中不同的独立区域之间的边界。 One or more characters for specifying the boundary between  separate, independent regions in plain text or other data streams. 逗号Comma
      编码Encoding 指定字符架构表中用于读取数据集的位。Identifies what bit to character schema table to use to read your dataset. UTF-8UTF-8
      列标题Column headers 指示如何处理数据集的标头(如果有)。Indicates how the headers of the dataset, if any, will be treated. 所有文件都具有相同的标题All files have same headers
      跳过行Skip rows 指示要跳过数据集中的多少行(如果有)。Indicates how many, if any, rows are skipped in the dataset. None
    7. 通过“架构”窗体,可以进一步为此试验配置数据。 The Schema form allows for further configuration of your data for this experiment. 对于本示例,为 day_of_week 特征选择切换开关,以便在此试验中不包含在内 。For this example, select the toggle switch for the day_of_week feature, so as to not include it for this experiment. 选择“下一步”。Select Next.

      “预览”选项卡中的配置

    8. 在“确认详细信息” 窗体上,确认信息与先前在“基本信息” 和“设置和预览” 窗体上填充的内容匹配。On the Confirm details form, verify the information matches what was previously populated on the Basic info and Settings and preview forms.

    9. 选择“创建” 以完成数据集的创建。Select Create to complete the creation of your dataset.

    10. 当数据集出现在列表中时,则选择它。Select your dataset once it appears in the list.

    11. 查看“数据预览” ,以确保未包括“day_of_week” ,然后选择“确定” 。Review the Data preview to ensure you didn't include day_of_week then, select OK.

    12. 选择“下一步”。 Select Next.

  7. 按如下所示填充“配置运行” 窗体:Populate the Configure Run form as follows:

    1. 输入以下试验名称:my-1st-automl-experimentEnter this experiment name: my-1st-automl-experiment

    2. 选择“y”作为用于执行预测的目标列。 Select y as the target column, what you want to predict. 此列指示客户是否认购了定期存款产品。This column indicates whether the client subscribed to a term deposit or not.

    3. 选择“创建新计算”并配置计算目标。 Select Create a new compute and configure your compute target. 计算目标是本地的或基于云的资源环境,用于运行训练脚本或托管服务部署。A compute target is a local or cloud-based resource environment used to run your training script or host your service deployment. 对于此试验,我们使用基于云的计算。For this experiment, we use a cloud-based compute.

      字段Field 说明Description 教程的值Value for tutorial
      计算名称Compute name 用于标识计算上下文的唯一名称。A unique name that identifies your compute context. automl-computeautoml-compute
      虚拟机大小  Virtual machine size 指定计算资源的虚拟机大小。Select the virtual machine size for your compute. Standard_DS12_V2Standard_DS12_V2
      最小/最大节点数(在“高级设置”中)Min / Max nodes (in Advanced Settings) 若要分析数据,必须指定一个或多个节点。To profile data, you must specify 1 or more nodes. 最小节点数:1Min nodes: 1
      最大节点数:6Max nodes: 6
      1. 选择“创建”,获取计算目标。 Select Create to get the compute target.

        完成此操作需要数分钟的时间。This takes a couple minutes to complete.

      2. 创建后,从下拉列表中选择新的计算目标。After creation, select your new compute target from the drop-down list.

    4. 选择“下一步”。Select Next.

  8. 在“任务类型和设置” 窗体上,选择“分类” 作为机器学习任务类型。On the Task type and settings form, select Classification as the machine learning task type.

    1. 选择“查看其他配置设置” 并按如下所示填充字段。Select View additional configuration settings and populate the fields as follows. 使用这些设置可以更好地控制训练作业。These settings are to better control the training job. 否则,将会根据试验选择和数据应用默认设置。Otherwise, defaults are applied based on experiment selection and data.

      备注

      在本教程中,我们不会按迭代阈值设置指标分数或最大核心数,In this tutorial, you won't set a metric score or max cores per iterations threshold. 也不会阻止对算法进行测试。Nor will you block algorithms from being tested.

      其他配置 Additional configurations 说明Description 教程的值  Value for tutorial
      主要指标Primary metric 对机器学习算法进行度量时依据的评估指标。Evaluation metric that the machine learning algorithm will be measured by. AUC_weightedAUC_weighted
      自动特征化Automatic featurization 启用预处理。Enables preprocessing. 这包括自动化数据清理、准备和转换,以生成合成特征。This includes automatic data cleansing, preparing, and transformation to generate synthetic features. 启用Enable
      阻止的算法Blocked algorithms 要从训练作业中排除的算法Algorithms you want to exclude from the training job None
      退出条件Exit criterion 如果符合某个条件,则会停止训练作业。If a criteria is met, the training job is stopped. 训练作业时间(小时):  1Training job time (hours): 1
      指标分数阈值:  无Metric score threshold: None
      验证Validation 选择交叉验证类型和测试数。Choose a cross-validation type and number of tests. 验证类型:Validation type:
      k-折交叉验证   k-fold cross-validation

      验证次数:2Number of validations: 2
      并发Concurrency 已执行并行迭代的最大次数和每次迭代使用的最大内核数The maximum number of parallel iterations executed and cores used per iteration 最大并发迭代次数:  5Max concurrent iterations: 5
      最大内核数/迭代:   无Max cores per iteration: None

      选择“确定” 。Select OK.

  9. 选择“创建” 以运行试验。Select Create to run the experiment. 当试验准备开始时,将打开“运行详细信息” 屏幕并显示“运行状态” 。The Run Detail screen opens with the Run status as the experiment preparation begins.

重要

准备试验运行时,准备需要 10-15 分钟Preparation takes 10-15 minutes to prepare the experiment run. 运行以后,每个迭代还需要 2-3 分钟Once running, it takes 2-3 minutes more for each iteration.
定期选择“刷新” ,以查看实验过程中运行的状态。Select Refresh periodically to see the status of the run as the experiment progresses.

在生产环境中,你可能会走开一段时间。In production, you'd likely walk away for a bit. 但在本教程中,建议你开始浏览“模型”选项卡上的已测试算法,因为当其他模型仍在运行的时候,这些模型已经完成。But for this tutorial, we suggest you start exploring the tested algorithms on the Models tab as they complete while the others are still running.

浏览模型Explore models

导航到“模型” 选项卡,以查看测试的算法(模型)。Navigate to the Models tab to see the algorithms (models) tested. 默认情况下,这些模型在完成后按指标分数排序。By default, the models are ordered by metric score as they complete. 对于本教程,列表中首先显示评分最高的模型(评分根据所选 AUC_weighted 指标给出)。For this tutorial, the model that scores the highest based on the chosen AUC_weighted metric is at the top of the list.

在等待所有试验模型完成的时候,可以选择已完成模型的“算法名称” ,以便浏览其性能详细信息。While you wait for all of the experiment models to finish, select the Algorithm name of a completed model to explore its performance details.

下面将浏览“模型详细信息” 和“可视化” 选项卡,以查看选定模型的属性、指标和性能图表。The following navigates through the Model details and the Visualizations tabs to view the selected model's properties, metrics and performance charts.

运行迭代详细信息

部署模型Deploy the model

Azure 机器学习工作室中的自动化机器学习可以通过几个步骤将最佳模型部署为 Web 服务。Automated machine learning in Azure Machine Learning studio allows you to deploy the best model as a web service in a few steps. 部署是模型的集成,因此它可以对新数据进行预测并识别潜在的机会领域。Deployment is the integration of the model so it can predict on new data and identify potential areas of opportunity.

对于本试验,部署到 Web 服务意味着金融机构现已获得一个迭代和可缩放的 Web 解决方案,用于识别潜在的定期存款客户。For this experiment, deployment to a web service means that the financial institution now has an iterative and scalable web solution for identifying potential fixed term deposit customers.

运行完成后,导航回“运行详细信息” 页,然后选择“模型” 选项卡。选择“刷新” 。Once the run is complete, navigate back to the Run Detail page and select the Models tab. Select Refresh.

在此试验上下文中,根据 AUC_weighted 指标,VotingEnsemble 被视为最佳模型。In this experiment context, VotingEnsemble is considered the best model, based on the AUC_weighted metric. 我们将部署此模型,但请注意,部署需要大约 20 分钟才能完成。We deploy this model, but be advised, deployment takes about 20 minutes to complete. 部署过程需要几个步骤,包括注册模型、生成资源和为 Web 服务配置资源。The deployment process entails several steps including registering the model, generating resources, and configuring them for the web service.

  1. 选择左下角的“部署最佳模型”按钮 。Select the Deploy Best Model button in the bottom-left corner.

  2. 按如下所示填充“部署模型”窗格: Populate the Deploy a model pane as follows:

    字段Field Value
    部署名称Deployment name my-automl-deploymy-automl-deploy
    部署说明Deployment description 我的第一个自动化机器学习试验部署My first automated machine learning experiment deployment
    计算类型Compute type 选择“Azure 计算实例(ACI)”Select Azure Compute Instance (ACI)
    启用身份验证Enable authentication 禁用。Disable.
    使用自定义部署Use custom deployments 禁用。Disable. 允许自动生成默认驱动程序文件(评分脚本)和环境文件。Allows for the default driver file (scoring script) and environment file to be autogenerated.

    本示例使用“高级”菜单中提供的默认值。 For this example, we use the defaults provided in the Advanced menu.

  3. 选择“部署”。 Select Deploy.

    “运行”屏幕的顶部会以绿色字体显示一条成功消息,“建议的模型”窗格中的“部署状态”下会显示一条状态消息。 A green success message appears at the top of the Run screen, and in the Recommended model pane, a status message appears under Deploy status. 定期选择“刷新” 以检查部署状态。Select Refresh periodically to check the deployment status.

现在,你已获得一个正常运行的、可以生成预测结果的 Web 服务。Now you have an operational web service to generate predictions.

转到后续步骤 详细了解如何使用新的 Web 服务,以及如何使用 Power BI 的内置 Azure 机器学习支持来测试预测。Proceed to the Next Steps to learn more about how to consume your new web service, and test your predictions using Power BI's built in Azure Machine Learning support.

清理资源Clean up resources

部署文件比数据文件和试验文件更大,因此它们的存储成本也更大。Deployment files are larger than data and experiment files, so they cost more to store. 仅当你想要最大程度地降低帐户成本,或者想要保留工作区和试验文件时,才删除部署文件。Delete only the deployment files to minimize costs to your account, or if you want to keep your workspace and experiment files. 否则,如果你不打算使用任何文件,请删除整个资源组。Otherwise, delete the entire resource group, if you don't plan to use any of the files.

删除部署实例Delete the deployment instance

若要保留资源组和工作区以便在其他教程和探索中使用,请仅从 Azure 机器学习工作室中删除部署实例。Delete just the deployment instance from the Azure Machine Learning studio, if you want to keep the resource group and workspace for other tutorials and exploration.

  1. 转到 Azure 机器学习工作室Go to the Azure Machine Learning studio. 导航到你的工作区,然后在“资产” 窗格的左下角选择“终结点” 。Navigate to your workspace and on the left under the Assets pane, select Endpoints.

  2. 选择要删除的部署,然后选择“删除”。 Select the deployment you want to delete and select Delete.

  3. 选择“继续”。 Select Proceed.

删除资源组Delete the resource group

重要

已创建的资源可以用作其他 Azure 机器学习教程和操作方法文章的先决条件。The resources you created can be used as prerequisites to other Azure Machine Learning tutorials and how-to articles.

如果不打算使用已创建的资源,请删除它们,以免产生任何费用:If you don't plan to use the resources you created, delete them, so you don't incur any charges:

  1. 在 Azure 门户中,选择最左侧的“资源组” 。In the Azure portal, select Resource groups on the far left.

    在 Azure 门户中删除Delete in the Azure portal

  2. 从列表中选择已创建的资源组。From the list, select the resource group you created.

  3. 选择“删除资源组” 。Select Delete resource group.

  4. 输入资源组名称。Enter the resource group name. 然后选择“删除” 。Then select Delete.

后续步骤Next steps

在本自动化机器学习教程中,你已使用 Azure 机器学习工作室创建并部署了一个分类模型。In this automated machine learning tutorial, you used Azure Machine Learning studio to create and deploy a classification model. 有关详细信息和后续步骤,请参阅以下文章:See these articles for more information and next steps:

备注

此银行营销数据集是根据 Creative Commons (CCO:Public Domain) 许可条款提供的。This Bank Marketing dataset is made available under the Creative Commons (CCO: Public Domain) License. 数据库各项内容中的任何权利是根据数据库内容许可条款Kaggle 上授予的。Any rights in individual contents of the database are licensed under the Database Contents License and available on Kaggle. 此数据集最初在 UCI 机器学习数据库中提供。This dataset was originally available within the UCI Machine Learning Database.

[Moro et al., 2014] S. Moro, P. Cortez and P. Rita.[Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing.A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014.Decision Support Systems, Elsevier, 62:22-31, June 2014.