教學課程:使用自動化機器學習建立第一個分類模型Tutorial: Create your first classification model with automated machine learning

在本教學課程中,您會了解如何在 Azure 入口網站 (預覽) 中建立第一個自動化機器學習實驗,而且不需要撰寫任何一行程式碼。In this tutorial, you learn how to create your first automated machine learning experiment in the Azure portal (preview) without writing a single line of code. 此範例會建立分類模型來預測客戶是否會向金融機構申請定期存款。This example creates a classification model to predict if a client will subscribe to a fixed term deposit with a financial institution.

透過自動化機器學習,您可以將耗費大量時間的工作自動化。With automated machine learning, you can automate away time intensive tasks. 自動化機器學習會快速地逐一嘗試多種演算法和超參數的組合,協助您根據所選擇的成功計量找到最佳模型。Automated machine learning rapidly iterates over many combinations of algorithms and hyperparameters to help you find the best model based on a success metric of your choosing.

在本教學課程中,您將了解如何執行下列工作:In this tutorial, you learn how to do the following tasks:

  • 建立 Azure Machine Learning 服務工作區。Create an Azure Machine Learning service workspace.
  • 執行自動化機器學習實驗。Run an automated machine learning experiment.
  • 檢視實驗詳細資料。View experiment details.
  • 部署模型。Deploy the model.

必要條件Prerequisites

  • Azure 訂用帳戶。An Azure subscription. 如果您沒有 Azure 訂用帳戶,請建立免費帳戶If you don’t have an Azure subscription, create a free account.

  • 下載 bankmarketing_train.csv 資料檔案。Download the bankmarketing_train.csv data file. y 資料行指出客戶是否申請定期存款,稍後本教學課程會將其識別為預測的目標資料行。The y column indicates if a customer subscribed to a fixed term deposit, which is later identified as the target column for predictions in this tutorial.

建立工作區Create a workspace

  1. 使用您所使用之 Azure 訂用帳戶的認證來登入 Azure 入口網站Sign in to the Azure portal by using the credentials for the Azure subscription you use.

  2. 在 Azure 入口網站的左上角,選取 [+建立資源] 。In the upper-left corner of Azure portal, select + Create a resource.

    建立新的資源

  3. 使用搜尋列尋找 [Machine Learning 服務工作區] 。Use the search bar to find Machine Learning service workspace.

  4. 選取 [Machine Learning 服務工作區] 。Select Machine Learning service workspace.

  5. 在 [Machine Learning 服務工作區] 窗格中選取 [建立] 來開始操作。In the Machine Learning service workspace pane, select Create to begin.

  6. 提供下列資訊來設定新的工作區:Provide the following information to configure your new workspace:

    欄位Field 說明Description
    工作區名稱Workspace name 輸入可識別您工作區的唯一名稱。Enter a unique name that identifies your workspace. 在此範例中,我們使用 docs-wsIn this example, we use docs-ws. 名稱必須是整個資源群組中唯一的。Names must be unique across the resource group. 請使用可輕鬆回想並且與其他人建立的工作區有所區別的名稱。Use a name that's easy to recall and to differentiate from workspaces created by others.
    SubscriptionSubscription 選取您要使用的 Azure 訂用帳戶。Select the Azure subscription that you want to use.
    Resource groupResource group 在您的訂用帳戶中使用現有的資源群組,或輸入名稱來建立新的資源群組。Use an existing resource group in your subscription or enter a name to create a new resource group. 資源群組會保留 Azure 方案的相關資源。A resource group holds related resources for an Azure solution. 在此範例中,我們使用 docs-amlIn this example, we use docs-aml.
    LocationLocation 選取最接近您的使用者與資料資源的位置,以建立工作區。Select the location closest to your users and the data resources to create your workspace.
  7. 在完成工作區的設定後,選取 [建立] 。After you are finished configuring the workspace, select Create.

    警告

    在雲端中建立工作區可能需要數分鐘的時間。It can take a several minutes to create your workspace in the cloud.

    程序完成後,會出現部署成功訊息。When the process is finished, a deployment success message appears.

  8. 若要檢視新的工作區,選取 [前往資源] 。To view the new workspace, select Go to resource.

建立及執行實驗Create and run the experiment

這些步驟會逐步引導您完成實驗設定,也就是選取資料到選擇您的主要計量和模型類型。These steps walk you through experiment set up from data selection to choosing your primary metric and model type.

  1. 移至工作區的左側窗格。Go to the left pane of your workspace. 在 [製作 (預覽)] 區段下選取 [自動化機器學習] 。Select Automated machine learning under the Authoring (Preview) section. 您會看到 [歡迎使用自動化機器學習] 的畫面,因為這是您第一個使用自動化機器的實驗。You'll see the Welcome to Automated Machine Learning screen, since this is your first experiment with Automated Machine Learning.

    Azure 入口網站導覽窗格

  2. 選取 [建立實驗] 。Select Create experiment. 然後輸入 my-1st-automl-experiment 作為實驗名稱。Then enter my-1st-automl-experiment as the experiment name.

  3. 選取 [建立新的計算] ,並設定此實驗的計算內容。Select Create a new compute and configure your compute context for this experiment.

    欄位Field Value
    計算名稱Compute name 輸入可識別您計算內容的唯一名稱。Enter a unique name that identifies your compute context. 在此範例中,我們使用 automl-computeFor this example, we use automl-compute.
    虛擬機器大小Virtual machine size 為您的計算選取虛擬機器大小。Select the virtual machine size for your compute. 我們使用 Standard_DS12_V2We use Standard_DS12_V2.
    其他設定Additional settings 最小節點: 1.Min node: 1. 若要啟用資料分析,您必須有一個或多個節點。To enable data profiling, you must have one or more nodes.
    最大節點 :6.Max node: 6.

    若要建立新的計算,請選取 [建立] 。To create your new compute, select Create. 這需要一些時間。This takes a few moments.

    建立完成時,請從下拉式清單中選取新的計算,然後選取 [下一步] 。When creation is complete, select your new compute from the drop-down list, and then select Next.

    注意

    在本教學課程中,我們會使用以新計算建立的預設儲存體帳戶和容器。For this tutorial, we use the default storage account and container created with your new compute. 這些項目會自動填入表單。They automatically populate in the form.

  4. 請選取 [上傳] ,然後從本機電腦選擇 bankmarketing_train.csv,以將其上傳到預設容器。Select Upload and choose the bankmarketing_train.csv file from your local computer to upload it to the default container. 公開預覽版僅支援本機檔案上傳和 Azure Blob 儲存體帳戶。Public preview supports only local file uploads and Azure Blob storage accounts. 上傳完成後,請從清單中選取檔案。When the upload is complete, select the file from the list.

  5. [預覽] 索引標籤可讓我們進一步設定此實驗的資料。The Preview tab allows us to further configure our data for this experiment.

    在 [預覽] 索引標籤上,使資料包含標題。On the Preview tab, indicate that the data includes headers. 該服務預設會包含所有用於定型的特性 (資料行)。The service defaults to include all of the features (columns) for training. 在此範例中,請向右捲動並忽略 day_of_week 特性。For this example, scroll to the right and Ignore the day_of_week feature.

    預覽索引標籤的設定

    注意

    沒有最小節點的計算無法使用資料分析。Data profiling is not available with computes that have zero minimum nodes.

  6. 請選取 [分類] 作為預測工作。Select Classification as the prediction task.

  7. 選取 [y] 作為我們要在其中執行預測的目標資料行。Select y as the target column, where we want to do predictions. 此資料行會指出用戶端是否已申請定期存款。This column indicates whether the client subscribed to a term deposit or not.

  8. 展開 [進階設定] 並填入欄位,如下所示。Expand Advanced Settings and populate the fields as follows.

    進階設定Advanced settings Value
    主要計量Primary metric AUC_weightedAUC_weighted
    允出準則Exit criteria 符合上述任一條件時,訓練作業就會在完整完成前結束:When any of these criteria are met, the training job ends before full completion:
    訓練作業時間 (分鐘) :5Training job time (minutes): 5
    反覆運算的次數上限 :10Max number of iterations: 10
    前置處理Preprocessing 啟用由自動化機器學習完成的前置處理。Enables preprocessing done by automated machine learning. 這包括用以產生綜合特性的自動化資料清理、準備和轉換。This includes automatic data cleansing, preparing, and transformation to generate synthetic features.
    驗證Validation 選取 [K 折交叉驗證] ,並選取 [2] 作為交叉驗證的數目。Select K-fold cross-validation and 2 for the number of cross-validations.
    並行Concurrency 選取 [5] 作為最大並行反覆運算次數。Select 5 for the number of max concurrent iterations.

    注意

    在此實驗中,我們不會為每次反覆運算的計量或最大核心數設定閾值。For this experiment, we don't set a metric or max cores per iterations threshold. 我們也不會防止演算法受到測試。We also don't block algorithms from being tested.

  9. 選取 [開始] 來執行實驗。Select Start to run the experiment.

    當實驗啟動時,您會在頂端看到具有下列狀態的空白 [執行詳細資料] 畫面。When the experiment starts, you see a blank Run Detail screen with the following status at the top.

實驗準備程序會需要幾分鐘的時間。The experiment preparation process takes a couple of minutes. 當程序完成時,狀態訊息會變更為 [執行中] 。When the process finishes, the status message changes to Run is Running.

檢視實驗詳細資料View experiment details

當實驗進行時,[執行詳細資料] 畫面會以執行的不同反復項目 (模型) 來更新反覆運算圖表和清單。As the experiment progresses, the Run Detail screen updates the iteration chart and list with the different iterations (models) that are run. 反覆項目清單會依計量分數排序。The iterations list is in order by metric score. 根據預設,AUC_weighted 計量最高分的模型會在清單頂端。By default, the model that scores the highest based on our AUC_weighted metric is at the top of the list.

提示

訓練作業需要幾分鐘的時間,來讓每個管線完成執行。Training jobs take several minutes for each pipeline to finish running.

執行詳細資料儀表板Run details dashboard

部署模型Deploy the model

藉由使用 Azure 入口網站中的自動化機器學習,我們可以將最佳模型部署為 Web 服務,以根據新資料進行預測,並找出潛在的商機區域。By using automated machine learning in the Azure portal, we can deploy the best model as a web service to predict on new data and identify potential areas of opportunity. 此實驗中的部署表示金融機構現在有可反覆進行且可調整的解決方案,能識別潛在的定期存款客戶。For this experiment, deployment means that the financial institution now has an iterative and scalable solution for identifying potential fixed term deposit customers.

在此實驗內容中,根據 AUC_weighted 計量,VotingEnsemble 會被視為最佳模型。In this experiment context, VotingEnsemble is considered the best model, based on the AUC_weighted metric. 我們會部署此模型,但提醒您部署大約需要 20 分鐘的時間才能完成。We deploy this model, but be advised, deployment takes about 20 minutes to complete.

  1. 在 [執行詳細資料] 頁面上,選取 [部署最佳模型] 按鈕。On the Run Detail page, select the Deploy Best Model button.

  2. 填入 [部署最佳模型] 窗格,如下所示:Populate the Deploy Best Model pane as follows:

    欄位Field Value
    部署名稱Deployment name my-automl-deploymy-automl-deploy
    部署描述Deployment description 我的第一個自動化機器學習實驗部署My first automated machine learning experiment deployment
    評分指令碼Scoring script 自動產生Autogenerate
    環境指令碼Environment script 自動產生Autogenerate
  3. 選取 [部署] 。Select Deploy.

    當部署成功完成後,會出現部署完成的訊息。A deployment complete message appears when deployment successfully finishes.

現在您已有可運作的 Web 服務,可用來產生預測。Now you have an operational web service to generate predictions.

清除資源Clean up resources

部署檔案比資料和實驗檔案大,因此儲存的成本會較高。Deployment files are larger than data and experiment files, so they cost more to store. 如果您想要保留工作區和實驗檔案,那麼僅刪除部署檔案可將成本降到最低。Delete only the deployment files to minimize costs to your account, or if you want to keep your workspace and experiment files. 或者,如果您不打算使用任何檔案,您可以刪除整個資源群組。Otherwise, delete the entire resource group, if you don't plan to use any of the files.

刪除部署執行個體Delete the deployment instance

如果您想要保留資源群組和工作區以進行其他教學課程和探索,您可以只從 Azure 入口網站刪除部署執行個體。Delete just the deployment instance from the Azure portal, if you want to keep the resource group and workspace for other tutorials and exploration.

  1. 移至左側的 [資產] 窗格,然後選取 [部署] 。Go to the Assets pane on the left and select Deployments.

  2. 選取您想要刪除的部署,然後選取 [刪除] 。Select the deployment you want to delete and select Delete.

  3. 選取 [繼續] 。Select Proceed.

刪除資源群組Delete the resource group

重要

您所建立的資源可用來作為其他 Azure Machine Learning 服務教學課程和操作說明文章的先決條件。The resources you created can be used as prerequisites to other Azure Machine Learning service tutorials and how-to articles.

如果您不打算使用您建立的資源,請刪除它們,以免產生任何費用:If you don't plan to use the resources you created, delete them, so you don't incur any charges:

  1. 在 Azure 入口網站中,選取最左邊的 [資源群組] 。In the Azure portal, select Resource groups on the far left.

    在 Azure 入口網站中刪除Delete in the Azure portal

  2. 在清單中,選取您所建立的資源群組。From the list, select the resource group you created.

  3. 選取 [刪除資源群組] 。Select Delete resource group.

  4. 輸入資源群組名稱。Enter the resource group name. 然後選取 [刪除] 。Then select Delete.

後續步驟Next steps

在此自動化機器學習教學課程中,您已使用 Azure 入口網站來建立和部署分類模型。In this automated machine learning tutorial, you used the Azure portal to create and deploy a classification model. 請參閱下列文章,以了解更多資訊及接下來的步驟:See these articles for more information and next steps:

注意

此銀行行銷資料集可在 Creative Commons (CCO:公用網域) 授權底下取得。This Bank Marketing dataset is made available under the Creative Commons (CCO: Public Domain) License. 個別資料庫內容中的任何權限都是以資料庫內容授權為依據,並可在 Kaggle 上取得。Any rights in individual contents of the database are licensed under the Database Contents License and available on Kaggle. 此資料集原本位在 UCI Machine Learning 資料庫內。This dataset was originally available within the UCI Machine Learning Database.

請引用下列內容:Please cite the following work:
[Moro et al., 2014] S. Moro, P. Cortez and P. Rita.[Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing.A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014.Decision Support Systems, Elsevier, 62:22-31, June 2014.