使用 Azure Machine Learning 建立、檢閱和部署自動化機器學習模型Create, review, and deploy automated machine learning models with Azure Machine Learning

在本文中,您將瞭解如何在 Azure Machine Learning studio 中不需要一行程式碼,即可建立、探索及部署自動化機器學習模型。In this article, you learn how to create, explore, and deploy automated machine learning models without a single line of code in Azure Machine Learning studio.

自動化機器學習是針對特定資料來選取最佳機器學習服務演算法的流程。Automated machine learning is a process in which the best machine learning algorithm to use for your specific data is selected for you. 此流程可讓您快速產生機器學習模型。This process enables you to generate machine learning models quickly. 深入了解自動化機器學習Learn more about automated machine learning.

如需端對端範例,請嘗試使用 Azure Machine Learning 自動化 ML 介面建立分類模型的教學課程For an end to end example, try the tutorial for creating a classification model with Azure Machine Learning's automated ML interface.

如需以 Python 程式碼為基礎的體驗,請使用 Azure Machine Learning SDK 設定自動化機器學習實驗For a Python code-based experience, configure your automated machine learning experiments with the Azure Machine Learning SDK.

PrerequisitesPrerequisites

開始使用Get started

  1. https://ml.azure.com 登入 Azure Machine Learning。Sign in to Azure Machine Learning at https://ml.azure.com.

  2. 選取訂用帳戶及工作區。Select your subscription and workspace.

  3. 巡覽至左側窗格。Navigate to the left pane. 選取位於 [撰寫] 區段下的 [自動化 ML]。Select Automated ML under the Author section.

Azure Machine Learning Studio 導覽窗格Azure Machine Learning studio navigation pane

若這是第一次執行任何實驗,則將會看到空白清單,以及前往文件的連結。If this is your first time doing any experiments, you'll see an empty list and links to documentation.

否則,將會看到最近自動化機器學習實驗的清單,其中包括使用 SDK 建立的項目。Otherwise, you'll see a list of your recent automated machine learning experiments, including those created with the SDK.

建立及執行實驗Create and run experiment

  1. 選取 [+ 新增自動化 ML 執行] 並填入表單。Select + New automated ML run and populate the form.

  2. 從儲存體容器選取資料集,或建立新的資料集。Select a dataset from your storage container, or create a new dataset. 資料集可從本機檔案、Web URL、資料存放區或 Azure 開放資料集建立。Datasets can be created from local files, web urls, datastores, or Azure open datasets. 深入瞭解 資料集建立Learn more about dataset creation.

    重要

    訓練資料的需求:Requirements for training data:

    • 資料必須是表格式形式。Data must be in tabular form.
    • 您想要預測的值 (目標資料行) 必須存在於資料中。The value you want to predict (target column) must be present in the data.
    1. 若要從本機電腦上的檔案建立新的資料集,請選取 [ + 建立資料集 ],然後選取 [ 從本機 檔案]。To create a new dataset from a file on your local computer, select +Create dataset and then select From local file.

    2. 在 [ 基本資訊 ] 表單中,為您的資料集提供一個唯一的名稱,並提供選擇性的描述。In the Basic info form, give your dataset a unique name and provide an optional description.

    3. 選取 [下一步] 來開啟 [資料存放區和檔案選取表單]。Select Next to open the Datastore and file selection form. 在此表單上,您將會選取上傳資料集的位置:與工作區一同自動建立的預設儲存體容器,或選取想要用於實驗的儲存體容器。On this form you select where to upload your dataset; the default storage container that's automatically created with your workspace, or choose a storage container that you want to use for the experiment.

      1. 如果您的資料位於虛擬網路後方,您必須啟用 [ 略過驗證 ] 功能,以確保工作區可以存取您的資料。If your data is behind a virtual network, you need to enable the skip the validation function to ensure that the workspace can access your data. 如需詳細資訊,請參閱 在 Azure 虛擬網路中使用 Azure Machine Learning studioFor more information, see Use Azure Machine Learning studio in an Azure virtual network.
    4. 選取 [流覽] ,為您的資料集上傳資料檔案。Select Browse to upload the data file for your dataset.

    5. 檢閱 [設定和預覽] 表單以進行確認。Review the Settings and preview form for accuracy. 表單會根據檔案類型以智慧方式填入。The form is intelligently populated based on the file type.

      欄位Field 描述Description
      檔案格式File format 定義檔案中所儲存資料的版面配置和類型。Defines the layout and type of data stored in a file.
      分隔符號Delimiter 一或多個字元,其用來指定純文字或其他資料流中個別獨立區域之間的界限。One or more characters for specifying the boundary between separate, independent regions in plain text or other data streams.
      編碼Encoding 識別要用來讀取資料集之字元結構描述資料表的位元。Identifies what bit to character schema table to use to read your dataset.
      資料行標題Column headers 指出資料集標題 (如果有的話) 的處理方式。Indicates how the headers of the dataset, if any, will be treated.
      Skip rows (略過資料列)Skip rows 指出資料集內略過多少資料列 (如果有的話)。Indicates how many, if any, rows are skipped in the dataset.

      選取 [下一步] 。Select Next.

    6. [結構描述] 表單會根據在 [設定與預覽] 表單中選取的項目以智慧方式填入。The Schema form is intelligently populated based on the selections in the Settings and preview form. 請在此處設定每個資料行的資料類型、檢閱資料行名稱,以及選取針對實驗 不要包含 哪些資料行。Here configure the data type for each column, review the column names, and select which columns to Not include for your experiment.

      選取 [下一步]。Select Next.

    7. [確認詳細資料] 表單上會顯示先前在 [基本資訊] 和 [設定與預覽] 表單中填入的資訊摘要。The Confirm details form is a summary of the information previously populated in the Basic info and Settings and preview forms. 您也可以選擇使用啟用分析的計算,為資料集建立資料設定檔。You also have the option to create a data profile for your dataset using a profiling enabled compute. 深入了解資料分析Learn more about data profiling.

      選取 [下一步] 。Select Next.

  3. 在新建立的資料集出現後選取該資料集。Select your newly created dataset once it appears. 您也可以檢視資料集的預覽和範例統計資料。You are also able to view a preview of the dataset and sample statistics.

  4. 在 [設定執行] 表單上,輸入唯一的實驗名稱。On the Configure run form, enter a unique experiment name.

  5. 選取目標資料行;這是將要進行預測的資料行。Select a target column; this is the column that you would like to do predictions on.

  6. 為資料分析和訓練作業選取計算。Select a compute for the data profiling and training job. 現有的計算清單會出現在下拉式清單中。A list of your existing computes is available in the dropdown. 若要建立新的計算,請遵循步驟 7 中的指示。To create a new compute, follow the instructions in step 7.

  7. 選取 [建立新的計算] 來針對此實驗設定計算內容。Select Create a new compute to configure your compute context for this experiment.

    欄位Field 描述Description
    計算名稱Compute name 輸入可識別您計算內容的唯一名稱。Enter a unique name that identifies your compute context.
    虛擬機器優先順序Virtual machine priority 低優先順序的虛擬機器較便宜,但不保證計算節點。Low priority virtual machines are cheaper but don't guarantee the compute nodes.
    虛擬機器類型Virtual machine type 選取虛擬機器類型的 CPU 或 GPU。Select CPU or GPU for virtual machine type.
    虛擬機器大小Virtual machine size 為您的計算選取虛擬機器大小。Select the virtual machine size for your compute.
    最小/最大節點數Min / Max nodes 若要分析資料,您必須指定一個或多個節點。To profile data, you must specify 1 or more nodes. 輸入所計算的節點數上限。Enter the maximum number of nodes for your compute. AML Compute 的預設為 6 個節點。The default is 6 nodes for an AML Compute.
    進階設定Advanced settings 這些設定可讓您為您的實驗設定使用者帳戶和現有的虛擬網路。These settings allow you to configure a user account and existing virtual network for your experiment.

    選取 [建立]。Select Create. 建立新的計算可能會需要數分鐘。Creation of a new compute can take a few minutes.

    注意

    計算名稱會指出所選/建立的計算是否已「啟用分析」。Your compute name will indicate if the compute you select/create is profiling enabled. (如需詳細資料,請參閱資料分析)。(See the section data profiling for more details).

    選取 [下一步] 。Select Next.

  8. 在 [工作類型和設定] 表單上,選取工作類型:分類、迴歸,或預測。On the Task type and settings form, select the task type: classification, regression, or forecasting. 如需詳細資訊,請參閱支援的工作 類型See supported task types for more information.

    1. 針對 分類,您也可以啟用深度學習。For classification, you can also enable deep learning.

      如果啟用了深度學習,則僅限 train_validation 分割 的驗證。If deep learning is enabled, validation is limited to train_validation split. 深入瞭解驗證選項Learn more about validation options.

    2. 針對 預測 ,您可以For forecasting you can,

      1. 啟用深度學習。Enable deep learning.

      2. 選取 時間資料行:此資料行包含要使用的時間資料。Select time column: This column contains the time data to be used.

      3. 選取 預測範圍:指出模型能夠預測到未來的時間單位數 (分鐘/小時/天/周/月/年) 。Select forecast horizon: Indicate how many time units (minutes/hours/days/weeks/months/years) will the model be able to predict to the future. 模型需要針對未來預測的時間越長,其正確性越低。The further the model is required to predict into the future, the less accurate it will become. 深入了解預測及預測範圍Learn more about forecasting and forecast horizon.

  9. (選擇性) 檢視其他組態設定:可用來更進一步控制訓練作業的其他設定。(Optional) View addition configuration settings: additional settings you can use to better control the training job. 否則會根據實驗選取範圍和資料來套用預設值。Otherwise, defaults are applied based on experiment selection and data.

    其他組態Additional configurations 描述Description
    主要計量Primary metric 用來評分模型的主要計量。Main metric used for scoring your model. 深入了解模型計量Learn more about model metrics.
    解釋最佳模型Explain best model 選取以啟用或停用,以顯示建議最佳模型的說明。Select to enable or disable, in order to show explanations for the recommended best model.
    這項功能目前不適用於 特定的預測演算法This functionality is not currently available for certain forecasting algorithms.
    封鎖的演算法Blocked algorithm 選取要從訓練作業中排除的演算法。Select algorithms you want to exclude from the training job.

    允許演算法僅適用于 SDK 實驗Allowing algorithms is only available for SDK experiments.
    請參閱 每種工作類型支援的模型See the supported models for each task type.
    結束準則Exit criterion 當符合其中任何一項準則時,訓練作業即會停止。When any of these criteria are met, the training job is stopped.
    訓練作業時間 (小時):允許訓練作業執行的時間長度。Training job time (hours): How long to allow the training job to run.
    計量分數閾值:所有管線的最低計量分數。Metric score threshold: Minimum metric score for all pipelines. 這可確保若擁有想要達到的已定義目標計量,則不會在訓練作業上花費超過必要程度的時間。This ensures that if you have a defined target metric you want to reach, you do not spend more time on the training job than necessary.
    驗證Validation 選取要在訓練作業中使用的交叉驗證選項。Select one of the cross validation options to use in the training job.
    深入了解交叉驗證Learn more about cross validation.

    預測只支援 k 折交叉驗證。Forecasting only supports k-fold cross validation.
    並行Concurrency 並行反覆項目上限:要在訓練作業中測試的管線 (反覆項目) 數量上限。Max concurrent iterations: Maximum number of pipelines (iterations) to test in the training job. 作業不會執行超過指定數量的反覆項目。The job will not run more than the specified number of iterations. 深入瞭解自動化 ML 如何在叢集 上執行多個子執行。Learn more about how automated ML performs multiple child runs on clusters.
  10. (選擇性的) View 特徵化設定:如果您選擇在 [其他設定] 表單中啟用 自動特徵化,則會套用預設的特徵化技術。(Optional) View featurization settings: if you choose to enable Automatic featurization in the Additional configuration settings form, default featurization techniques are applied. View 特徵化設定 中,您可以變更這些預設值,並據以進行自訂。In the View featurization settings you can change these defaults and customize accordingly. 瞭解如何 自訂 featurizationsLearn how to customize featurizations.

    螢幕擷取畫面顯示 [選取工作類型] 對話方塊,其中已呼叫 [View 特徵化 settings]。

自訂特徵化Customize featurization

特徵化 表單中,您可以啟用/停用自動特徵化,以及為您的實驗自訂自動特徵化設定。In the Featurization form, you can enable/disable automatic featurization and customize the automatic featurization settings for your experiment. 若要開啟此表單,請參閱 建立和執行實驗 一節中的步驟10。To open this form, see step 10 in the Create and run experiment section.

下表摘要說明目前可透過 studio 取得的自訂。The following table summarizes the customizations currently available via the studio.

資料行Column 自訂Customization
已包括Included 指定要包含哪些資料行以供定型。Specifies which columns to include for training.
功能類型Feature type 變更所選資料行的數值型別。Change the value type for the selected column.
插補Impute with 選取要在您的資料中插補遺漏值的值。Select what value to impute missing values with in your data.

Azure Machine Learning studio 自訂特徵化

執行實驗並檢視結果Run experiment and view results

選取 [完成] 以執行實驗。Select Finish to run your experiment. 實驗準備流程最多需要 10 分鐘。The experiment preparing process can take up to 10 minutes. 訓練作業可能需要額外 2-3 分鐘不等,才能讓每個管線完成執行。Training jobs can take an additional 2-3 minutes more for each pipeline to finish running.

注意

自動化 ML 採用的演算法具有固有的隨機性,可能會導致建議模型的最終計量分數稍微變化,例如精確度。The algorithms automated ML employs have inherent randomness that can cause slight variation in a recommended models final metrics score, like accuracy. 自動化 ML 也會在必要時對資料(例如,訓練-測試分割、定型驗證分割或交叉驗證)執行作業。Automated ML also performs operations on data such as train-test split, train-validation split or cross-validation when necessary. 因此,如果您多次執行具有相同設定和主要計量的實驗,您可能會在每個實驗中看到由於這些因素的最終計量分數變化。So if you run an experiment with the same configuration settings and primary metric multiple times, you'll likely see variation in each experiments final metrics score due to these factors.

檢視實驗詳細資料View experiment details

[執行詳細資料] 畫面會在 [詳細資料] 索引標籤中開啟。此畫面會顯示實驗執行的摘要,並會在頂端執行編號的旁邊包含狀態列。The Run Detail screen opens to the Details tab. This screen shows you a summary of the experiment run including a status bar at the top next to the run number.

[模型] 索引標籤包含依計量分數所建立的模型清單。The Models tab contains a list of the models created ordered by the metric score. 依預設,根據所選計量評分最高的模型會出現在清單頂端。By default, the model that scores the highest based on the chosen metric is at the top of the list. 如果訓練作業嘗試多個模型,系統會將所有結果新增到清單中。As the training job tries out more models, they are added to the list. 使用此方式快速比較到目前為止所產生的各個模型計量。Use this to get a quick comparison of the metrics for the models produced so far.

執行詳細資料儀表板Run details dashboard

檢視訓練執行的詳細資料View training run details

向下切入任何已完成的模型,以查看定型執行詳細資料,例如 [ 模型 ] 索引標籤上的模型摘要或 [ 計量 ] 索引標籤上的 [效能度量圖表]。 深入瞭解圖表Drill down on any of the completed models to see training run details, like a model summary on the Model tab or performance metric charts on the Metrics tab. Learn more about charts.

反覆項目詳細資料Iteration details

部署模型Deploy your model

當手邊具備最佳模型時,即可將其作為 Web 服務部署以預測新的資料。Once you have the best model at hand, it is time to deploy it as a web service to predict on new data.

自動化 ML 可協助部署模型,而無須撰寫程式碼:Automated ML helps you with deploying the model without writing code:

  1. 您有數個部署選項。You have a couple options for deployment.

    • 選項1:根據您定義的度量準則,部署最佳模型。Option 1: Deploy the best model, according to the metric criteria you defined.

      1. 實驗完成之後,請選取畫面頂端的 [ 執行 1 ],以流覽至父執行頁面。After the experiment is complete, navigate to the parent run page by selecting Run 1 at the top of the screen.
      2. 選取 [ 最佳模型摘要 ] 區段中所列的模型。Select the model listed in the Best model summary section.
      3. 選取視窗左上角的 [ 部署 ]。Select Deploy on the top left of the window.
    • 選項2:從此實驗部署特定模型反復專案。Option 2: To deploy a specific model iteration from this experiment.

      1. 從 [模型] 索引標籤中選取所需的模型Select the desired model from the Models tab
      2. 選取視窗左上角的 [ 部署 ]。Select Deploy on the top left of the window.
  2. 填入 [部署模型] 窗格。Populate the Deploy model pane.

    欄位Field Value
    名稱Name 輸入部署的唯一名稱。Enter a unique name for your deployment.
    描述Description 輸入描述以更清楚地識別此部署的用途。Enter a description to better identify what this deployment is for.
    計算類型Compute type 選取想要部署的端點類型:Azure Kubernetes Service (AKS)Azure 容器執行個體 (ACI)Select the type of endpoint you want to deploy: Azure Kubernetes Service (AKS) or Azure Container Instance (ACI).
    計算名稱Compute name 僅適用於 AKS:選取想要部署的目標 AKS 叢集名稱。Applies to AKS only: Select the name of the AKS cluster you wish to deploy to.
    啟用驗證Enable authentication 選取允許以權杖為基礎或以金鑰為基礎的驗證。Select to allow for token-based or key-based authentication.
    使用自訂部署資產Use custom deployment assets 若想要上傳自己的評分指令碼和環境檔案,請啟用此功能。Enable this feature if you want to upload your own scoring script and environment file. 深入了解評分指令碼Learn more about scoring scripts.

    重要

    檔案名稱必須少於 32 個字元,且必須以英數字元開始及結束。File names must be under 32 characters and must begin and end with alphanumerics. 其中可包含虛線、底線、點和英數字元。May include dashes, underscores, dots, and alphanumerics between. 不允許空格。Spaces are not allowed.

    「進階」功能表提供預設部署功能,例如資料收集和資源使用率設定。The Advanced menu offers default deployment features such as data collection and resource utilization settings. 若想要覆寫這些預設,請在此功能表中進行。If you wish to override these defaults do so in this menu.

  3. 選取 [部署]。Select Deploy. 部署需要約 20 分鐘才能完成。Deployment can take about 20 minutes to complete. 開始部署後,會出現 [模型摘要] 索引標籤。Once deployment begins, the Model summary tab appears. 請參閱 部署狀態 一節底下的部署進度。See the deployment progress under the Deploy status section.

現在您已擁有可運作的 Web 服務,可用來產生預測!Now you have an operational web service to generate predictions! 您可從 Power BI 內建的 Azure Machine Learning 支援以透過查詢服務來測試預測。You can test the predictions by querying the service from Power BI's built in Azure Machine Learning support.

後續步驟Next steps