選擇參數來優化 Machine Learning Studio (傳統) 中的演算法Choose parameters to optimize your algorithms in Machine Learning Studio (classic)

適用於: 適用於。Machine Learning Studio (傳統版) 不適用於。Azure Machine LearningAPPLIES TO: Applies to.Machine Learning Studio (classic) Does not apply to.Azure Machine Learning

本主題說明如何為 Azure Machine Learning Studio (傳統) 中的演算法選擇正確的超參數集。This topic describes how to choose the right hyperparameter set for an algorithm in Azure Machine Learning Studio (classic). 大部分的機器學習服務演算法都會有需要設定的參數。Most machine learning algorithms have parameters to set. 當您訓練一個模型時,必須提供這些參數的值。When you train a model, you need to provide values for those parameters. 訓練過的模型效率會依據所選擇的模型參數而定。The efficacy of the trained model depends on the model parameters that you choose. 找出最佳參數集的過程稱為 模型選擇The process of finding the optimal set of parameters is known as model selection.

有各種方法可用來進行模型選擇。There are various ways to do model selection. 在機器學習中,交叉驗證是其中一種最廣泛使用的模型選擇方法,它是 Azure Machine Learning Studio (傳統) 中的預設模型選擇機制。In machine learning, cross-validation is one of the most widely used methods for model selection, and it is the default model selection mechanism in Azure Machine Learning Studio (classic). 因為 Azure Machine Learning Studio (傳統) 同時支援 R 和 Python,所以您一律可以使用 R 或 Python 來執行自己的模型選擇機制。Because Azure Machine Learning Studio (classic) supports both R and Python, you can always implement their own model selection mechanisms by using either R or Python.

找出最佳參數集的過程有四個步驟:There are four steps in the process of finding the best parameter set:

  1. 定義參數空間:對於演算法,先決定您想要考慮的確切參數值。Define the parameter space: For the algorithm, first decide the exact parameter values you want to consider.
  2. 定義交叉驗證設定:決定如何選擇資料集的交叉驗證折數。Define the cross-validation settings: Decide how to choose cross-validation folds for the dataset.
  3. 定義計量:決定要使用哪一種計量來判斷最佳的參數集,例如正確度、均方根誤差、精確度、召回率或 f 分數。Define the metric: Decide what metric to use for determining the best set of parameters, such as accuracy, root mean squared error, precision, recall, or f-score.
  4. 訓練、評估和比較:對於每個唯一的參數值組合,執行交叉驗證並根據您定義的錯誤計量。Train, evaluate, and compare: For each unique combination of the parameter values, cross-validation is carried out by and based on the error metric you define. 評估和比較之後,您可以選擇最佳的模型。After evaluation and comparison, you can choose the best-performing model.

下圖說明如何在 Azure Machine Learning Studio (傳統) 中達成此目的。The following image illustrates how this can be achieved in Azure Machine Learning Studio (classic).

尋找最佳的參數集

定義參數空間Define the parameter space

您可以在進行模型初始化步驟時定義參數集。You can define the parameter set at the model initialization step. 所有機器學習演算法的參數窗格都有兩種訓練模式:[單一參數] 和 [參數範圍]。The parameter pane of all machine learning algorithms has two trainer modes: Single Parameter and Parameter Range. 選擇 [參數範圍] 模式。Choose Parameter Range mode. 在參數範圍模式中,您可以針對每個參數輸入多個值。In Parameter Range mode, you can enter multiple values for each parameter. 您可以在文字方塊中輸入以逗號分隔的值。You can enter comma-separated values in the text box.

二元促進式決策樹,單一參數

或者,您可以使用 使用範圍產生器 來定義網格的最大點數與最小點數,以及要產生的總點數。Alternately, you can define the maximum and minimum points of the grid and the total number of points to be generated with Use Range Builder. 參數值預設會以線性刻度產生。By default, the parameter values are generated on a linear scale. 但如果核取了 [對數刻度],值會以對數刻度產生 (也就是相鄰兩點的比率而不是其差異為常數)。But if Log Scale is checked, the values are generated in the log scale (that is, the ratio of the adjacent points is constant instead of their difference). 對於整數參數,您可以使用連字號來定義範圍。For integer parameters, you can define a range by using a hyphen. 例如,"1-10" 表示介於1到10之間的所有整數 ((含)) 形成參數集。For example, "1-10" means that all integers between 1 and 10 (both inclusive) form the parameter set. 也支援使用混合的模式。A mixed mode is also supported. 例如,參數集 "1-10,20,50" 會包含整數1-10、20和50。For example, the parameter set "1-10, 20, 50" would include integers 1-10, 20, and 50.

二元促進式決策樹,參數範圍

定義交叉驗證折數Define cross-validation folds

資料分割和取樣模組可用來隨機指派資料的折數。The Partition and Sample module can be used to randomly assign folds to the data. 在下圖模組的範例組態中,我們定義五個折數,並且對樣本實例隨機指派折疊數目。In the following sample configuration for the module, we define five folds and randomly assign a fold number to the sample instances.

資料分割和取樣

定義計量Define the metric

微調模型超參數模組支援依據經驗為指定的演算法和資料集選擇一組最佳參數。The Tune Model Hyperparameters module provides support for empirically choosing the best set of parameters for a given algorithm and dataset. 除了有關訓練模型的其他資訊,此模組的 [屬性] 窗格還包括用來判斷最佳參數集的計量。In addition to other information regarding training the model, the Properties pane of this module includes the metric for determining the best parameter set. 分類和迴歸演算法分別有兩個不同的下拉式清單方塊。It has two different drop-down list boxes for classification and regression algorithms, respectively. 如果考慮使用分類演算法,則會忽略迴歸計量,反之亦然。If the algorithm under consideration is a classification algorithm, the regression metric is ignored and vice versa. 在此特定範例中,計量是 正確度In this specific example, the metric is Accuracy.

掃掠參數

訓練、評估和比較Train, evaluate, and compare

相同的微調模型超參數模組會訓練對應於參數集的所有模型、評估各種計量,然後根據您選擇的計量建立訓練最妥善的模型。The same Tune Model Hyperparameters module trains all the models that correspond to the parameter set, evaluates various metrics, and then creates the best-trained model based on the metric you choose. 此模組有兩個必要的輸入項:This module has two mandatory inputs:

  • 未訓練過的學習者The untrained learner
  • 資料集The dataset

此模組也有一個選擇性資料集輸入項。The module also has an optional dataset input. 將具有折疊資訊的資料集連接到必要的資料集輸入。Connect the dataset with fold information to the mandatory dataset input. 如果資料集未被指派任何折疊資訊,則預設會自動執行 10 折交叉驗證。If the dataset is not assigned any fold information, then a 10-fold cross-validation is automatically executed by default. 如果尚未進行折疊指派,且在選擇性資料集區域提供了驗證資料集,則會使用選擇的訓練-測試模式和第一個資料集針對每一個參數組合訓練模型。If the fold assignment is not done and a validation dataset is provided at the optional dataset port, then a train-test mode is chosen and the first dataset is used to train the model for each parameter combination.

推進式決策樹分類器

接著,會針對驗證資料集評估模型。The model is then evaluated on the validation dataset. 模組的左側輸出區域將不同的計量顯示為參數值的函數。The left output port of the module shows different metrics as functions of parameter values. 右側的輸出區域提供訓練過的模型,根據選擇的計量 (在此案例中為 正確度) 對應到最佳的執行模型。The right output port gives the trained model that corresponds to the best-performing model according to the chosen metric (Accuracy in this case).

驗證資料集

將右側輸出區域具體呈現之後,您可以看到所選擇的確切參數。You can see the exact parameters chosen by visualizing the right output port. 此模型在儲存為訓練過的模型之後,可用來對測試集計分,或者用在可運作的 Web 服務。This model can be used in scoring a test set or in an operationalized web service after saving as a trained model.