教學課程:在 Azure Machine Learning 中使用自動化 ML 建立分類模型Tutorial: Create a classification model with automated ML in Azure Machine Learning

在本教學課程中,您會了解如何使用 Azure Machine Learning Studio 中的自動化機器學習,建立簡單的分類模型,而不需要撰寫任何一行程式碼。In this tutorial, you learn how to create a simple classification model without writing a single line of code using automated machine learning in the Azure Machine Learning studio. 此分類模型會建立分類模型來預測客戶是否會向金融機構申請定期存款。This classification model predicts if a client will subscribe to a fixed term deposit with a financial institution.

透過自動化機器學習,您可以將耗費大量時間的工作自動化。With automated machine learning, you can automate away time intensive tasks. 自動化機器學習會快速地逐一嘗試多種演算法和超參數的組合,協助您根據所選擇的成功計量找到最佳模型。Automated machine learning rapidly iterates over many combinations of algorithms and hyperparameters to help you find the best model based on a success metric of your choosing.

如需時間序列預測範例,請參閱教學課程:需求預測與 AutoMLFor a time-series forecasting example, see Tutorial: Demand forecasting & AutoML.

在本教學課程中,您將了解如何執行下列工作:In this tutorial, you learn how to do the following tasks:

  • 建立 Azure Machine Learning 工作區。Create an Azure Machine Learning workspace.
  • 執行自動化機器學習實驗。Run an automated machine learning experiment.
  • 檢視實驗詳細資料。View experiment details.
  • 部署模型。Deploy the model.

PrerequisitesPrerequisites

  • Azure 訂用帳戶。An Azure subscription. 如果您沒有 Azure 訂用帳戶,請建立免費帳戶If you don't have an Azure subscription, create a free account.

  • 下載 bankmarketing_train.csv 資料檔案。Download the bankmarketing_train.csv data file. y 資料行指出客戶是否申請定期存款,稍後本教學課程會將其識別為預測的目標資料行。The y column indicates if a customer subscribed to a fixed term deposit, which is later identified as the target column for predictions in this tutorial.

建立工作區Create a workspace

Azure Machine Learning 工作區是雲端中您用來實驗、定型及部署機器學習模型的基礎資源。An Azure Machine Learning workspace is a foundational resource in the cloud that you use to experiment, train, and deploy machine learning models. 工作區可將您的 Azure 訂用帳戶和資源群組與服務中容易使用的物件結合。It ties your Azure subscription and resource group to an easily consumed object in the service.

有許多方式可以建立工作區There are many ways to create a workspace. 在本教學課程中,您會透過 Azure 入口網站建立工作區 (管理 Azure 資源的 Web 型主控台)。In this tutorial, you create a workspace via the Azure portal, a web-based console for managing your Azure resources.

  1. 使用您 Azure 訂閱的認證來登入 Azure 入口網站Sign in to the Azure portal by using the credentials for your Azure subscription.

  2. 在 Azure 入口網站的左上角,選取 [+ 建立資源]****。In the upper-left corner of the Azure portal, select + Create a resource.

    顯示 [建立資源] 選項的螢幕擷取畫面。

  3. 使用搜尋列尋找 Machine LearningUse the search bar to find Machine Learning.

  4. 選取 [Machine Learning] 。Select Machine Learning.

  5. 在 [Machine Learning] 窗格中選取 [建立] 來開始操作。In the Machine Learning pane, select Create to begin.

  6. 提供下列資訊來設定新的工作區:Provide the following information to configure your new workspace:

    欄位Field 描述Description
    工作區名稱Workspace name 輸入可識別您工作區的唯一名稱。Enter a unique name that identifies your workspace. 在此範例中,我們使用 docs-wsIn this example, we use docs-ws. 名稱必須是整個資源群組中唯一的。Names must be unique across the resource group. 請使用可輕鬆回想並且與其他人建立的工作區有所區別的名稱。Use a name that's easy to recall and to differentiate from workspaces created by others.
    訂用帳戶Subscription 選取您要使用的 Azure 訂用帳戶。Select the Azure subscription that you want to use.
    資源群組Resource group 在您的訂用帳戶中使用現有的資源群組,或輸入名稱來建立新的資源群組。Use an existing resource group in your subscription, or enter a name to create a new resource group. 資源群組會保留 Azure 方案的相關資源。A resource group holds related resources for an Azure solution. 在此範例中,我們使用 docs-amlIn this example, we use docs-aml.
    LocationLocation 選取最接近您的使用者與資料資源的位置,以建立工作區。Select the location closest to your users and the data resources to create your workspace.
    工作區版本Workspace edition 選取 [基本] 作為此教學課程的工作區類型。Select Basic as the workspace type for this tutorial. 工作區類型會決定您可以存取的功能和定價。The workspace type determines the features to which you'll have access and pricing. 此教學課程中的所有內容都可以使用基本或企業工作區來執行。Everything in this tutorial can be performed with either a Basic or Enterprise workspace.
  7. 在完成工作區的設定後,選取 [檢閱 + 建立]。After you're finished configuring the workspace, select Review + Create.

    警告

    在雲端中建立工作區可能需要數分鐘的時間。It can take several minutes to create your workspace in the cloud.

    程序完成後,會出現部署成功訊息。When the process is finished, a deployment success message appears.

  8. 若要檢視新的工作區,選取 [前往資源] 。To view the new workspace, select Go to resource.

重要

記下您的 工作區訂用帳戶Take note of your workspace and subscription. 您會需要這些項目,以確保您在正確位置建立實驗。You'll need these to ensure you create your experiment in the right place.

在 Azure Machine Learning Studio 中開始使用Get started in Azure Machine Learning studio

您會透過 Azure Machine Learning Studio 在 https://ml.azure.com 中完成下列實驗設定及執行步驟;這是統一的 Web 介面,為所有技能等級的資料科學從業人員,提供執行資料科學案例的機器學習工具。You complete the following experiment set-up and run steps via the Azure Machine Learning studio at https://ml.azure.com, a consolidated web interface that includes machine learning tools to perform data science scenarios for data science practitioners of all skill levels. Internet Explorer 瀏覽器不支援 Studio。The studio is not supported on Internet Explorer browsers.

  1. 登入 Azure Machine Learning StudioSign in to Azure Machine Learning studio.

  2. 選取訂用帳戶與您建立的工作區。Select your subscription and the workspace you created.

  3. 選取 [馬上開始]。 Select Get started.

  4. 在左側窗格中,選取 [撰寫] 區段下的 [自動化 ML]。In the left pane, select Automated ML under the Author section.

    由於這是您的第一個自動化 ML 實驗,因此您會看到一個空白清單與文件連結。Since this is your first automated ML experiment, you'll see an empty list and links to documentation.

    開始使用頁面

  5. 選取 [+ 新增自動化 ML 執行]。Select +New automated ML run.

建立和載入資料集Create and load dataset

在設定實驗之前,請先將資料檔案以 Azure Machine Learning 資料集的形式上傳到工作區。Before you configure your experiment, upload your data file to your workspace in the form of an Azure Machine Learning dataset. 如此一來,您就可以確保資料會格式化為適合進行實驗。Doing so, allows you to ensure that your data is formatted appropriately for your experiment.

  1. 透過從 [+建立資料集] 下拉式清單選取 [從本機檔案],以建立新的資料集。Create a new dataset by selecting From local files from the +Create dataset drop-down.

    1. 在 [基本資訊] 表單上,為資料集提供名稱,並提供選擇性描述。On the Basic info form, give your dataset a name and provide an optional description. 自動化 ML 介面目前僅支援表格式資料集,因此資料集類型應預設為 表格式The automated ML interface currently only supports TabularDatasets, so the dataset type should default to Tabular.

    2. 選取左下方的 [下一步]。Select Next on the bottom left

    3. 在 [資料存放區和檔案選取] 表單上,選取在建立工作區時自動設定的預設資料存放區 [workspaceblobstore (Azure Blob 儲存體)]。On the Datastore and file selection form, select the default datastore that was automatically set up during your workspace creation, workspaceblobstore (Azure Blob Storage). 您可以在這裡上傳資料檔案,以供工作區使用。This is where you'll upload your data file to make it available to your workspace.

    4. 選取 [瀏覽]。 Select Browse.

    5. 選擇您本機電腦上的 bankmarketing_train.csv 檔案。Choose the bankmarketing_train.csv file on your local computer. 這是您作為必要條件下載的檔案。This is the file you downloaded as a prerequisite.

    6. 為您的資料集提供唯一名稱,並提供選擇性的描述。Give your dataset a unique name and provide an optional description.

    7. 選取左下角 [下一步],將其上傳至在您工作區建立期間自動設定的預設容器。Select Next on the bottom left, to upload it to the default container that was automatically set up during your workspace creation.

      上傳完成時,系統會根據檔案類型,預先填入 [設定與預覽] 表單。When the upload is complete, the Settings and preview form is pre-populated based on the file type.

    8. 確認 [設定與預覽] 表單的填入方式如下,然後選取 [下一步]Verify that the Settings and preview form is populated as follows and select Next.

      欄位Field 描述Description 教學課程的值Value for tutorial
      檔案格式File format 定義檔案中所儲存資料的版面配置和類型。Defines the layout and type of data stored in a file. Delimited (分隔檔)Delimited
      分隔符號Delimiter 一或多個字元,用來指定純文字或其他資料流程中個別獨立區域之間的界限。One or more characters for specifying the boundary between  separate, independent regions in plain text or other data streams. Comma (逗號)Comma
      編碼Encoding 識別要用來讀取資料集之字元結構描述資料表的位元。Identifies what bit to character schema table to use to read your dataset. UTF-8UTF-8
      資料行標題Column headers 指出資料集標題 (如果有的話) 的處理方式。Indicates how the headers of the dataset, if any, will be treated. All files have same headers (所有檔案都有相同的標頭)All files have same headers
      Skip rows (略過資料列)Skip rows 指出資料集內略過多少資料列 (如果有的話)。Indicates how many, if any, rows are skipped in the dataset. NoneNone
    9. [Schema] (結構描述) 表單可讓您進一步設定此實驗的資料。The Schema form allows for further configuration of your data for this experiment. 針對此範例,請選取 [day_of_week] 特徵的切換開關,如此一來,就不會將它包含在此實驗中。For this example, select the toggle switch for the day_of_week feature, so as to not include it for this experiment. 選取 [下一步] 。Select Next.

      預覽索引標籤的設定

    10. 在 [確認詳細資料] 表單上,確認資訊符合先前在 [基本資訊、資料存放區和檔案選取] 與 [設定和預覽] 表單上填入的內容。On the Confirm details form, verify the information matches what was previously populated on the Basic info, Datastore and file selection and Settings and preview forms.

    11. 選取 [建立] 以完成資料集的建立。Select Create to complete the creation of your dataset.

    12. 當您的資料集出現在清單中時,請加以選取。Select your dataset once it appears in the list.

    13. 檢閱 [資料預覽] 以確保您未包含 day_of_week ,然後選取 [確定]。Review the Data preview to ensure you didn't include day_of_week then, select OK.

    14. 選取 [下一步]。Select Next.

設定實驗執行Configure experiment run

在載入和設定資料之後,您可以設定您的實驗。After you load and configure your data, you can set up your experiment. 這項設定包括實驗設計工作,例如,選取計算環境的大小,以及指定您要預測的資料行。This setup includes experiment design tasks such as, selecting the size of your compute environment and specifying what column you want to predict.

  1. 如下所示填入 [設定執行] 表單:Populate the Configure Run form as follows:

    1. 輸入此實驗名稱:my-1st-automl-experimentEnter this experiment name: my-1st-automl-experiment

    2. 選取 [y] 作為您要預測的目標資料行。Select y as the target column, what you want to predict. 此資料行會指出用戶端是否已申請定期存款。This column indicates whether the client subscribed to a term deposit or not.

    3. 選取 [建立新的計算],並設定您的計算目標。Select Create a new compute and configure your compute target. 計算目標是用來執行定型指令碼或裝載服務部署的本機或雲端式資源環境。A compute target is a local or cloud-based resource environment used to run your training script or host your service deployment. 針對此實驗,我們會使用雲端式計算。For this experiment, we use a cloud-based compute.

      欄位Field 描述Description 教學課程的值Value for tutorial
      計算名稱Compute name 可識別您計算內容的唯一名稱。A unique name that identifies your compute context. automl-computeautoml-compute
      虛擬機器類型  Virtual machine type 為您的計算選取虛擬機器類型。Select the virtual machine type for your compute. CPU (中央處理器)CPU (Central Processing Unit)
      虛擬機器大小  Virtual machine size 為您的計算選取虛擬機器大小。Select the virtual machine size for your compute. Standard_DS12_V2Standard_DS12_V2
      最小/最大節點數Min / Max nodes 若要分析資料,您必須指定一個或多個節點。To profile data, you must specify 1 or more nodes. 最小節點數:1Min nodes: 1
      最大節點數:6Max nodes: 6
      縮小之前的閒置秒數Idle seconds before scale down 叢集自動縮小至最小節點計數之前的閒置時間。Idle time before the cluster is automatically scaled down to the minimum node count. 120 (預設值)120 (default)
      進階設定Advanced settings 用於設定和授權虛擬網路以進行實驗的設定。Settings to configure and authorize a virtual network for your experiment. NoneNone
      1. 選取 [建立] 以取得計算目標。Select Create to get the compute target.

        這需要幾分鐘來完成。This takes a couple minutes to complete.

      2. 建立完成後,請從下拉式清單選取新的計算目標。After creation, select your new compute target from the drop-down list.

    4. 選取 [下一步] 。Select Next.

  2. 在 [工作類型和設定] 表單上,藉由指定機器學習工作類型和組態設定,來完成自動化 ML 實驗的設定。On the Task type and settings form, complete the setup for your automated ML experiment by specifying the machine learning task type and configuration settings.

    1. 選取 [分類] 作為機器學習工作類型。Select Classification as the machine learning task type.

    2. 選取 [檢視其他組態設定] 並填入欄位,如下所示。Select View additional configuration settings and populate the fields as follows. 這些設定可進一步控制訓練作業。These settings are to better control the training job. 否則會根據實驗選取範圍和資料來套用預設值。Otherwise, defaults are applied based on experiment selection and data.

      其他設定 Additional configurations 描述Description 教學課程的值  Value for tutorial
      主要計量Primary metric 用於測量機器學習演算法的評估計量。Evaluation metric that the machine learning algorithm will be measured by. AUC_weightedAUC_weighted
      解釋最佳模型Explain best model 自動在自動化 ML 所建立的最佳模型上顯示可解釋性。Automatically shows explainability on the best model created by automated ML. 啟用Enable
      封鎖的演算法Blocked algorithms 您要從定型作業中排除的演算法Algorithms you want to exclude from the training job NoneNone
      結束準則Exit criterion 如果符合條件,訓練作業就會停止。If a criteria is met, the training job is stopped. 定型作業時間 (小時):  1Training job time (hours): 1
      計量分數閾值:  NoneMetric score threshold: None
      驗證Validation 選擇交叉驗證類型與測試次數。Choose a cross-validation type and number of tests. 驗證類型:Validation type:
      K 折交叉驗證   k-fold cross-validation

      驗證次數:2Number of validations: 2
      並行Concurrency 每個反覆運算已執行的平行反覆運算數目上限The maximum number of parallel iterations executed per iteration 並行反覆運算上限:  5Max concurrent iterations: 5

      選取 [儲存] 。Select Save.

  3. 選取 [完成] 以執行實驗。Select Finish to run the experiment. 當實驗準備開始時,[回合詳細資料] 畫面隨即開啟,其頂端顯示 [執行狀態]。The Run Detail screen opens with the Run status at the top as the experiment preparation begins.

重要

準備實驗執行需要 10-15 分鐘 的時間。Preparation takes 10-15 minutes to prepare the experiment run. 執行之後, 每個反覆項目需要 2-3 分鐘以上的時間Once running, it takes 2-3 minutes more for each iteration.
定期選取 [重新整理] 以查看實驗進行時的執行狀態。Select Refresh periodically to see the status of the run as the experiment progresses.

在生產環境中,您可以先離開一下。In production, you'd likely walk away for a bit. 但是在此教學課程中,在其他項目仍在執行時,我們建議您開始探索 [模型] 索引標籤上完成的已測試演算法。But for this tutorial, we suggest you start exploring the tested algorithms on the Models tab as they complete while the others are still running.

探索模型Explore models

瀏覽至 [模型] 索引標籤,以查看已測試的演算法 (模型)。Navigate to the Models tab to see the algorithms (models) tested. 根據預設,模型會在完成時依計量分數排序。By default, the models are ordered by metric score as they complete. 在本教學課程中,根據所選 AUC_weighted 計量評分最高的模型會在清單頂端。For this tutorial, the model that scores the highest based on the chosen AUC_weighted metric is at the top of the list.

當您等候所有實驗模型完成時,可選取已完成模型的 演算法名稱 來探索其效能詳細資料。While you wait for all of the experiment models to finish, select the Algorithm name of a completed model to explore its performance details.

以下內容在 [詳細資料] 與 [計量] 索引標籤中進行瀏覽,以查看所選模型的屬性、計量與效能圖表。The following navigates through the Details and the Metrics tabs to view the selected model's properties, metrics, and performance charts.

執行反覆項目的詳細資料

部署最佳模型Deploy the best model

自動化機器學習介面可讓您透過幾個步驟,將最佳模型部署為 Web 服務。The automated machine learning interface allows you to deploy the best model as a web service in a few steps. 部署是模型的整合,因此可以根據新資料進行預測,並找出潛在的商機區域。Deployment is the integration of the model so it can predict on new data and identify potential areas of opportunity.

此實驗中對 Web 服務的部署表示金融機構現在有可反覆進行且可調整的 Web 解決方案,能識別潛在的定期存款客戶。For this experiment, deployment to a web service means that the financial institution now has an iterative and scalable web solution for identifying potential fixed term deposit customers.

查看您的實驗執行是否已完成。Check to see if your experiment run is complete. 若要這麼做,請選取畫面頂端的 [執行 1],瀏覽回到父執行頁面。To do so, navigate back to the parent run page by selecting Run 1 at the top of your screen. [完成] 狀態會顯示在畫面的左上方。A Completed status is shown on the top left of the screen.

一旦實驗執行完成後,[詳細資料] 頁面就會填入 [最佳模型摘要] 區段。Once the experiment run is complete, the Details page is populated with a Best model summary section. 在此實驗內容中,根據 AUC_weighted 計量, VotingEnsemble 會被視為最佳模型。In this experiment context, VotingEnsemble is considered the best model, based on the AUC_weighted metric.

我們會部署此模型,但提醒您部署大約需要 20 分鐘的時間才能完成。We deploy this model, but be advised, deployment takes about 20 minutes to complete. 部署程序需要幾個步驟,包括註冊模型、產生資源,以及為 Web 服務設定這些資源。The deployment process entails several steps including registering the model, generating resources, and configuring them for the web service.

  1. 選取 [VotingEnsemble] 以開啟模型特定頁面。Select VotingEnsemble to open the model-specific page.

  2. 選取左上方的 [部署] 按鈕。Select the Deploy button in the top-left.

  3. 填入 [部署模型] 窗格,如下所示:Populate the Deploy a model pane as follows:

    欄位Field Value
    部署名稱Deployment name my-automl-deploymy-automl-deploy
    部署描述Deployment description 我的第一個自動化機器學習實驗部署My first automated machine learning experiment deployment
    計算類型Compute type 選取 Azure 計算執行個體 (ACI)Select Azure Compute Instance (ACI)
    啟用驗證Enable authentication 停用。Disable.
    使用自訂部署Use custom deployments 停用。Disable. 允許自動產生預設驅動程式檔案 (計分指令碼) 和環境檔案。Allows for the default driver file (scoring script) and environment file to be autogenerated.

    在此範例中,我們使用 [進階] 功能表中提供的預設值。For this example, we use the defaults provided in the Advanced menu.

  4. 選取 [部署]。Select Deploy.

    [執行] 畫面頂端會出現綠色成功訊息,而在 [模型摘要] 窗格中,狀態訊息會顯示在 [部署狀態] 底下。A green success message appears at the top of the Run screen, and in the Model summary pane, a status message appears under Deploy status. 定期選取 [重新整理] 以檢查部署狀態。Select Refresh periodically to check the deployment status.

現在您已有可運作的 Web 服務,可用來產生預測。Now you have an operational web service to generate predictions.

若要深入了解如何取用新的 Web 服務及如何使用 Power BI 內建的 Azure Machine Learning 支援來測試您的預測,請繼續進行 後續步驟Proceed to the Next Steps to learn more about how to consume your new web service, and test your predictions using Power BI's built in Azure Machine Learning support.

清除資源Clean up resources

部署檔案比資料和實驗檔案大,因此儲存的成本會較高。Deployment files are larger than data and experiment files, so they cost more to store. 如果您想要保留工作區和實驗檔案,那麼僅刪除部署檔案可將成本降到最低。Delete only the deployment files to minimize costs to your account, or if you want to keep your workspace and experiment files. 或者,如果您不打算使用任何檔案,您可以刪除整個資源群組。Otherwise, delete the entire resource group, if you don't plan to use any of the files.

刪除部署執行個體Delete the deployment instance

如果您想保留資源群組與工作區以進行其他教學課程和探索,可以在 https://ml.azure.com/ 只刪除 Azure Machine Learning 的部署執行個體。Delete just the deployment instance from Azure Machine Learning at https://ml.azure.com/, if you want to keep the resource group and workspace for other tutorials and exploration.

  1. 移至 Azure Machine LearningGo to Azure Machine Learning. 瀏覽至您的工作區,並在左側的 [資產] 窗格下,選取 [端點]。Navigate to your workspace and on the left under the Assets pane, select Endpoints.

  2. 選取您想要刪除的部署,然後選取 [刪除]。Select the deployment you want to delete and select Delete.

  3. 選取 [繼續]。Select Proceed.

刪除資源群組Delete the resource group

重要

您所建立的資源可用來作為其他 Azure Machine Learning 教學課程和操作說明文章的先決條件。The resources that you created can be used as prerequisites to other Azure Machine Learning tutorials and how-to articles.

如果您不打算使用您建立的資源,請加以刪除,以免產生任何費用:If you don't plan to use the resources that you created, delete them so you don't incur any charges:

  1. 在 Azure 入口網站中,選取最左邊的 [資源群組] 。In the Azure portal, select Resource groups on the far left.

  2. 從清單中,選取您所建立的資源群組。From the list, select the resource group that you created.

  3. 選取 [刪除資源群組]。Select Delete resource group.

    在 Azure 入口網站中刪除資源群組選項的螢幕擷取畫面。

  4. 輸入資源群組名稱。Enter the resource group name. 然後選取 [刪除] 。Then select Delete.

後續步驟Next steps

在此自動化機器學習教學課程中,您已使用 Azure Machine Learning 的自動化 ML 介面來建立及部署分類模型。In this automated machine learning tutorial, you used Azure Machine Learning's automated ML interface to create and deploy a classification model. 請參閱下列文章,以了解更多資訊及接下來的步驟:See these articles for more information and next steps:

注意

此銀行行銷資料集可在 Creative Commons (CCO:公用網域) 授權底下取得。This Bank Marketing dataset is made available under the Creative Commons (CCO: Public Domain) License. 個別資料庫內容中的任何權限都是以資料庫內容授權為依據,並可在 Kaggle 上取得。Any rights in individual contents of the database are licensed under the Database Contents License and available on Kaggle. 此資料集原本位在 UCI Machine Learning 資料庫內。This dataset was originally available within the UCI Machine Learning Database.

[Moro et al., 2014] S. Moro, P. Cortez and P. Rita.[Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing.A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014.Decision Support Systems, Elsevier, 62:22-31, June 2014.