教學課程:搭配 ML .NET 使用多類別分類來分類支援問題Tutorial: Categorize support issues using multiclass classification with ML .NET

此範例教學課程會示範使用 ML.NET,透過使用 Visual Studio 中 C# 的 .NET Core 主控台應用程式,建立 GitHub 問題分類器,來定型分類及預測 GitHub 問題 Area 標籤的模型。This sample tutorial illustrates using ML.NET to create a GitHub issue classifier to train a model that classifies and predicts the Area label for a GitHub issue via a .NET Core console application using C# in Visual Studio.

在本教學課程中,您將了解如何:In this tutorial, you learn how to:

  • 準備您的資料Prepare your data
  • 轉換資料Transform the data
  • 將模型定型Train the model
  • 評估模型Evaluate the model
  • 使用訓練過的模型預測Predict with the trained model
  • 使用已載入的模型部署和預測Deploy and Predict with a loaded model

您可以在 dotnet/samples 存放庫中找到本教學課程的原始程式碼。You can find the source code for this tutorial at the dotnet/samples repository.

必要條件Prerequisites

建立主控台應用程式Create a console application

建立專案Create a project

  1. 開啟 Visual Studio 2017。Open Visual Studio 2017. 從功能表列中選取 [檔案] > [新增] > [專案]。Select File > New > Project from the menu bar. 在 [新增專案] 對話方塊中,選取 [Visual C#] 節點,然後選取 [.NET Core] 節點。In the New Project dialog, select the Visual C# node followed by the .NET Core node. 然後選取 [主控台應用程式 (.NET Core)] 專案範本。Then select the Console App (.NET Core) project template. 在 [名稱] 文字方塊中,鍵入 "GitHubIssueClassification",然後選取 [確定] 按鈕。In the Name text box, type "GitHubIssueClassification" and then select the OK button.

  2. 在您的專案中建立一個名為 Data 的目錄以儲存資料集檔案:Create a directory named Data in your project to save your data set files:

    在 [方案總管] 中,於您的專案上按一下滑鼠右鍵,然後選取 [新增] > [新增資料夾]。In Solution Explorer, right-click on your project and select Add > New Folder. 輸入 "Data",然後按 Enter。Type "Data" and hit Enter.

  3. 在您的專案中建立名為 Models 的目錄,以儲存您的模型:Create a directory named Models in your project to save your model:

    在 [方案總管] 中,於您的專案上按一下滑鼠右鍵,然後選取 [新增] > [新增資料夾]。In Solution Explorer, right-click on your project and select Add > New Folder. 鍵入 "Models",然後按 ENTER。Type "Models" and hit Enter.

  4. 安裝「Microsoft.ML NuGet 套件」:Install the Microsoft.ML NuGet Package:

    在 [方案總管] 中,於您的專案上按一下滑鼠右鍵,然後選取 [管理 NuGet 套件]。In Solution Explorer, right-click on your project and select Manage NuGet Packages. 選擇 "nuget.org" 作為套件來源、選取 [瀏覽] 索引標籤、搜尋 Microsoft.ML、在清單中選取 v 1.0.0 套件,然後選取 [安裝] 按鈕。Choose "nuget.org" as the Package source, select the Browse tab, search for Microsoft.ML, select the v 1.0.0 package in the list, and select the Install button. 在 [預覽變更] 對話方塊上,選取 [確定] 按鈕,然後在 [授權接受] 對話方塊上,如果您同意所列套件的授權條款,請選取 [我接受]。Select the OK button on the Preview Changes dialog and then select the I Accept button on the License Acceptance dialog if you agree with the license terms for the packages listed.

準備您的資料Prepare your data

  1. 下載 issues_train.tsvissues_test.tsv 資料集,並將它們儲存至先前建立的 Data 資料夾。Download the issues_train.tsv and the issues_test.tsv data sets and save them to the Data folder previously created. 第一個資料集會將機器學習模型定型,第二個資料集則可用來評估您模型的準確率。The first dataset trains the machine learning model and the second can be used to evaluate how accurate your model is.

  2. 在 [方案總管] 中,於每個 *.tsv 檔案上按一下滑鼠右鍵,然後選取 [屬性]。In Solution Explorer, right-click each of the *.tsv files and select Properties. 在 [進階] 底下,將 [複製到輸出目錄] 的值變更為 [有更新時才複製]。Under Advanced, change the value of Copy to Output Directory to Copy if newer.

建立類別及定義路徑Create classes and define paths

Program.cs 檔案頂端新增下列額外的 using 陳述式:Add the following additional using statements to the top of the Program.cs file:

using System;
using System.IO;
using System.Linq;
using Microsoft.ML;

建立三個全域欄位來保留近期下載檔案的路徑,以及 MLContextDataViewPredictionEngine 的全域變數:Create three global fields to hold the paths to the recently downloaded files, and global variables for the MLContext,DataView, and PredictionEngine:

  • _trainDataPath 包含用來將模型定型的資料集路徑。_trainDataPath has the path to the dataset used to train the model.
  • _testDataPath 包含用來評估模型的資料集路徑。_testDataPath has the path to the dataset used to evaluate the model.
  • _modelPath 包含用來儲存定型模型的路徑。_modelPath has the path where the trained model is saved.
  • _mlContext 是提供處理內容的 MLContext_mlContext is the MLContext that provides processing context.
  • _trainingDataView 是用來處理定型資料集的 IDataView_trainingDataView is the IDataView used to process the training dataset.
  • _predEngine 是用於單一預測的 PredictionEngine<TSrc,TDst>_predEngine is the PredictionEngine<TSrc,TDst> used for single predictions.

將下列程式碼新增至 Main 方法正上方的一行,以指定這些路徑和其他變數:Add the following code to the line right above the Main method to specify those paths and the other variables:

private static string _appPath => Path.GetDirectoryName(Environment.GetCommandLineArgs()[0]);
private static string _trainDataPath => Path.Combine(_appPath, "..", "..", "..", "Data", "issues_train.tsv");
private static string _testDataPath => Path.Combine(_appPath, "..", "..", "..", "Data", "issues_test.tsv");
private static string _modelPath => Path.Combine(_appPath, "..", "..", "..", "Models", "model.zip");

private static MLContext _mlContext;
private static PredictionEngine<GitHubIssue, IssuePrediction> _predEngine;
private static ITransformer _trainedModel;
static IDataView _trainingDataView;

為輸入資料和預測建立一些類別。Create some classes for your input data and predictions. 將新類別新增至專案:Add a new class to your project:

  1. 在 [方案總管] 中,於專案上按一下滑鼠右鍵,然後選取 [新增] > [新增項目]。In Solution Explorer, right-click the project, and then select Add > New Item.

  2. 在 [新增項目] 對話方塊中,選取 [類別],然後將 [名稱] 欄位變更為 GitHubIssueData.csIn the Add New Item dialog box, select Class and change the Name field to GitHubIssueData.cs. 接著,選取 [新增] 按鈕。Then, select the Add button.

    GitHubIssueData.cs 檔案隨即在程式碼編輯器中開啟。The GitHubIssueData.cs file opens in the code editor. 將下列 using 陳述式新增至 GitHubIssueData.cs 最上方:Add the following using statement to the top of GitHubIssueData.cs:

using Microsoft.ML.Data;

移除現有的類別定義,然後將下列程式碼 (具有 GitHubIssueIssuePrediction 這兩個類別) 新增至 GitHubIssueData.cs 檔案:Remove the existing class definition and add the following code, which has two classes GitHubIssue and IssuePrediction, to the GitHubIssueData.cs file:

public class GitHubIssue
{
    [LoadColumn(0)]
    public string ID { get; set; }
    [LoadColumn(1)]
    public string Area { get; set; }
    [LoadColumn(2)]
    public string Title { get; set; }
    [LoadColumn(3)]
    public string Description { get; set; }
}

public class IssuePrediction
{
    [ColumnName("PredictedLabel")]
    public string Area;
}

label 是您希望進行預測的資料行。The label is the column you want to predict. 識別的 Features 是您提供模型,以預測標籤 (Label) 的輸入。The identified Features are the inputs you give the model to predict the Label.

請使用 LoadColumnAttribute 來指定資料集中來源資料行的索引。Use the LoadColumnAttribute to specify the indices of the source columns in the data set.

GitHubIssue 是輸入資料集類別,並具有下列 String 欄位:GitHubIssue is the input dataset class and has the following String fields:

  • 第一個資料行 ID(GitHub 問題識別碼)the first column ID (GitHub Issue ID)
  • 第二個資料行 Area(用於定型的預測)the second column Area (the prediction for training)
  • 第三個資料行 Title (GitHub 問題標題) 是第一個用來預測 Areafeaturethe third column Title (GitHub issue title) is the first feature used for predicting the Area
  • 第四個資料行 Description 是第二個用來預測 Areafeaturethe fourth column Description is the second feature used for predicting the Area

IssuePrediction 是在模型定型後,用來進行預測的類別。IssuePrediction is the class used for prediction after the model has been trained. 它包含單一布林值 string (Area) 和 PredictedLabel ColumnName 屬性。It has a single string (Area) and a PredictedLabel ColumnName attribute. PredictedLabel 的使用時機是在進行預測和評估的期間。The PredictedLabel is used during prediction and evaluation. 就評估而言,會使用含有定型資料、預設值及模型的輸入。For evaluation, an input with training data, the predicted values, and the model are used.

所有 ML.NET 作業都是從 MLContext 類別開始。All ML.NET operations start in the MLContext class. 初始化 mlContext 會建立新的 ML.NET 環境,可在模型建立工作流程物件間共用。Initializing mlContext creates a new ML.NET environment that can be shared across the model creation workflow objects. 就概念而言,它與 Entity Framework 中的 DBContext 相似。It's similar, conceptually, to DBContext in Entity Framework.

在 Main 中初始化變數Initialize variables in Main

將具有包含隨機種子 (seed: 0) 之新 MLContext 執行個體的 _mlContext 全域變數初始化,以讓多個定型間的結果可重複/具有確定性。Initialize the _mlContext global variable with a new instance of MLContext with a random seed (seed: 0) for repeatable/deterministic results across multiple trainings. Main 方法中,以下列程式碼取代 Console.WriteLine("Hello World!")Replace the Console.WriteLine("Hello World!") line with the following code in the Main method:

_mlContext = new MLContext(seed: 0);

載入資料Load the data

ML.NET 使用 IDataView 類別作為描述數字或文字表格式資料彈性且有效率的方式。ML.NET uses the IDataView class as a flexible, efficient way of describing numeric or text tabular data. IDataView 可載入文字檔案或即時進行 (例如 SQL 資料庫或記錄檔)。IDataView can load either text files or in real time (for example, SQL database or log files).

若要初始化並載入 _trainingDataView 全域變數以將其用於管線,請在 mlContext 初始化後,新增下列程式碼:To initialize and load the _trainingDataView global variable in order to use it for the pipeline, add the following code after the mlContext initialization:

_trainingDataView = _mlContext.Data.LoadFromTextFile<GitHubIssue>(_trainDataPath,hasHeader: true);

LoadFromTextFile() 會定義資料結構描述並讀入檔案中。The LoadFromTextFile() defines the data schema and reads in the file. 會接受資料路徑變數然後傳回 IDataViewIt takes in the data path variables and returns an IDataView.

將下列程式碼加入為 Main 方法中的下一行程式碼:Add the following as the next line of code in the Main method:

var pipeline = ProcessData();

ProcessData 方法會執行下列工作:The ProcessData method executes the following tasks:

  • 擷取並轉換資料。Extracts and transforms the data.
  • 傳回處理管線。Returns the processing pipeline.

請使用下列程式碼,在緊接著 Main 方法之後,建立 ProcessData 方法:Create the ProcessData method, just after the Main method, using the following code:

public static IEstimator<ITransformer> ProcessData()
{

}

擷取 Features 並傳輸資料Extract Features and transform the data

當您想要針對 GitHubIssue 預測 Area GitHub 標籤時,請使用 MapValueToKey() 方法,將 Area 資料行轉換成數字索引鍵類型 Label 資料行 (分類演算法接受的格式),並將它新增為新的資料集資料行:As you want to predict the Area GitHub label for a GitHubIssue, use the MapValueToKey() method to transform the Area column into a numeric key type Label column (a format accepted by classification algorithms) and add it as a new dataset column:

var pipeline = _mlContext.Transforms.Conversion.MapValueToKey(inputColumnName: "Area", outputColumnName: "Label")

接下來,請呼叫 mlContext.Transforms.Text.FeaturizeText,針對每個呼叫的 TitleFeaturizedDescriptionFeaturized,將文字 (TitleDescription) 資料行轉換成數值向量。Next, call mlContext.Transforms.Text.FeaturizeText which transforms the text (Title and Description) columns into a numeric vector for each called TitleFeaturized and DescriptionFeaturized. 將這兩個資料行的特徵轉換附加至管線,使用的程式碼如下:Append the featurization for both columns to the pipeline with the following code:

.Append(_mlContext.Transforms.Text.FeaturizeText(inputColumnName: "Title", outputColumnName: "TitleFeaturized"))
.Append(_mlContext.Transforms.Text.FeaturizeText(inputColumnName: "Description", outputColumnName: "DescriptionFeaturized"))

資料準備的最後一個步驟是使用 Concatenate() 方法,將所有特徵資料行合併到 Features (特徵) 資料行。The last step in data preparation combines all of the feature columns into the Features column using the Concatenate() method. 根據預設,學習演算法只會處理來自 Features 資料行的特徵。By default, a learning algorithm processes only features from the Features column. 將這此轉換附加至管線,使用的程式碼如下:Append this transformation to the pipeline with the following code:

.Append(_mlContext.Transforms.Concatenate("Features", "TitleFeaturized", "DescriptionFeaturized"))

接著,附加 AppendCacheCheckpoint 來快取 DataView,以便在您多次逐一查看資料時,使用快取可獲得更高的效能,如下列程式碼所示:Next, append a AppendCacheCheckpoint to cache the DataView so when you iterate over the data multiple times using the cache might get better performance, as with the following code:

.AppendCacheCheckpoint(_mlContext);

警告

針對小型/中型資料集使用 AppendCacheCheckpoint 以減少訓練時間。Use AppendCacheCheckpoint for small/medium datasets to lower training time. 請不要在處理非常大的資料集時使用它 (移除 .AppendCacheCheckpoint())。Do NOT use it (remove .AppendCacheCheckpoint()) when handling very large datasets.

ProcessData 方法的結尾傳回管線。Return the pipeline at the end of the ProcessData method.

return pipeline;

這個步驟會處理前置處理/特徵轉換。This step handles preprocessing/featurization. 使用 ML.NET 中可用的額外元件可讓您的模型產生更佳的結果。Using additional components available in ML.NET can enable better results with your model.

建置和定型模型Build and train the model

將下列呼叫新增至 BuildAndTrainModel 方法作為 Main 方法中的下一行程式碼:Add the following call to the BuildAndTrainModelmethod as the next line of code in the Main method:

var trainingPipeline = BuildAndTrainModel(_trainingDataView, pipeline);

BuildAndTrainModel 方法會執行下列工作:The BuildAndTrainModel method executes the following tasks:

  • 建立定型演算法類別。Creates the training algorithm class.
  • 將模型定型。Trains the model.
  • 根據定型資料預測區域。Predicts area based on training data.
  • 傳回模型。Returns the model.

請使用下列程式碼,在緊接著 Main 方法之後,建立 BuildAndTrainModel 方法:Create the BuildAndTrainModel method, just after the Main method, using the following code:

public static IEstimator<ITransformer> BuildAndTrainModel(IDataView trainingDataView, IEstimator<ITransformer> pipeline)
{

}

關於分類工作About the classification task

分類是一項機器學習服務工作,會使用資料來判斷項目或資料列的分類、類型或類別,且經常是下列其中一種類型:Classification is a machine learning task that uses data to determine the category, type, or class of an item or row of data and is frequently one of the following types:

  • 二元:不是 A 就是 B。Binary: either A or B.
  • 多元分類:可使用單一模型來預測的多重分類。Multiclass: multiple categories that can be predicted by using a single model.

針對此類型的問題,請使用多類別分類學習演算法,因為您的問題類別預測可能是多個類別 (多類別) 的其中之一,而不只是兩個類別 (二元)。For this type of problem, use a Multiclass classification learning algorithm, since your issue category prediction can be one of multiple categories (multiclass) rather than just two (binary).

透過將下列內容新增為 BuildAndTrainModel() 中的第一行程式碼,將機器學習服務演算法附加到資料轉換定義:Append the machine learning algorithm to the data transformation definitions by adding the following as the first line of code in BuildAndTrainModel():

var trainingPipeline = pipeline.Append(_mlContext.MulticlassClassification.Trainers.SdcaMaximumEntropy("Label", "Features"))
        .Append(_mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"));

SdcaMaximumEntropy 是您的多類別分類定型演算法。The SdcaMaximumEntropy is your multiclass classification training algorithm. 它會附加到 pipeline 並接受特徵化的 TitleDescription (Features) 及 Label 輸入參數,以從歷史資料學習。This is appended to the pipeline and accepts the featurized Title and Description (Features) and the Label input parameters to learn from the historic data.

將模型定型Train the model

將下列內容新增為 BuildAndTrainModel() 方法中的下一行程式碼,調整模型為合適於 splitTrainSet 資料並傳回已定型模型:Fit the model to the splitTrainSet data and return the trained model by adding the following as the next line of code in the BuildAndTrainModel() method:

_trainedModel = trainingPipeline.Fit(trainingDataView);

Fit() 方法會透過轉換資料集和套用定型來定型您的模型。The Fit()method trains your model by transforming the dataset and applying the training.

PredictionEngine 是一種便利的 API,可讓您傳遞並在單一資料執行個體上接著執行預測。The PredictionEngine is a convenience API, which allows you to pass in and then perform a prediction on a single instance of data. 將此新增為 BuildAndTrainModel() 方法中的下一行:Add this as the next line in the BuildAndTrainModel() method:

_predEngine = _mlContext.Model.CreatePredictionEngine<GitHubIssue, IssuePrediction>(_trainedModel);

使用訓練過的模型預測Predict with the trained model

透過建立 GitHubIssue 的執行個體,在 Predict 方法中新增 GitHub 問題,以測試所定型模型的預測:Add a GitHub issue to test the trained model's prediction in the Predict method by creating an instance of GitHubIssue:

GitHubIssue issue = new GitHubIssue() {
    Title = "WebSockets communication is slow in my machine",
    Description = "The WebSockets communication used under the covers by SignalR looks like is going slow in my development machine.."
};

使用 Predict() 函式來針對單一資料列進行預測:Use the Predict() function makes a prediction on a single row of data:

var prediction = _predEngine.Predict(issue);

使用模型:預測結果Using the model: prediction results

顯示 GitHubIssue 和相對應的 Area 標籤預測,以共用結果並根據結果相應地採取動作。Display GitHubIssue and corresponding Area label prediction in order to share the results and act on them accordingly. 請使用下列 Console.WriteLine() 程式碼來為結果建立顯示:Create a display for the results using the following Console.WriteLine() code:

Console.WriteLine($"=============== Single Prediction just-trained-model - Result: {prediction.Area} ===============");

傳回經訓練以供評估使用的模型Return the model trained to use for evaluation

BuildAndTrainModel 方法的結尾傳回模型。Return the model at the end of the BuildAndTrainModel method.

return trainingPipeline;

評估模型Evaluate the model

建立並定型模型之後,現在必須使用不同的資料集來評估它,以確保和驗證品質。Now that you've created and trained the model, you need to evaluate it with a different dataset for quality assurance and validation. Evaluate 方法中,會傳入在 BuildAndTrainModel 中建立的模型以供評估。In the Evaluate method, the model created in BuildAndTrainModel is passed in to be evaluated. 在緊接著 BuildAndTrainModel 之後,建立 Evaluate 方法,如以下程式碼所示:Create the Evaluate method, just after BuildAndTrainModel, as in the following code:

public static void Evaluate(DataViewSchema trainingDataViewSchema)
{

}

Evaluate 方法會執行下列工作:The Evaluate method executes the following tasks:

  • 載入測試資料集。Loads the test dataset.
  • 建立多類別評估工具。Creates the multiclass evaluator.
  • 評估模型並建立計量。Evaluates the model and create metrics.
  • 顯示計量。Displays the metrics.

請使用下列程式碼,在緊接著 BuildAndTrainModel 方法呼叫底下,從 Main 方法新增對新方法的呼叫:Add a call to the new method from the Main method, right under the BuildAndTrainModel method call, using the following code:

Evaluate(_trainingDataView.Schema);

如同您先前針對定型資料集所進行的操作,請透過將下列程式碼新增到 Evaluate 方法,來載入測試資料集:As you did previously with the training dataset, load the test dataset by adding the following code to the Evaluate method:

var testDataView = _mlContext.Data.LoadFromTextFile<GitHubIssue>(_testDataPath,hasHeader: true);

Evaluate() 方法會使用指定的資料集,計算模型的品質計量。The Evaluate() method computes the quality metrics for the model using the specified dataset. 它傳回的 MulticlassClassificationMetrics 物件包含多類別分類評估工具所計算的整體計量。It returns a MulticlassClassificationMetrics object that contains the overall metrics computed by multiclass classification evaluators. 若要顯示計量以判斷模型的品質,您必須先取得計量。To display the metrics to determine the quality of the model, you need to get them first. 請注意我們在此處使用機器學習服務 _trainedModel 全域變數 (一個 ITransformer) 的 Transform() 方法來輸入特徵並傳回預測。Notice the use of the Transform() method of the machine learning _trainedModel global variable (an ITransformer) to input the features and return predictions. 將下列程式碼加入 Evaluate 方法中作為的下一行:Add the following code to the Evaluate method as the next line:

var testMetrics = _mlContext.MulticlassClassification.Evaluate(_trainedModel.Transform(testDataView));

針對多類別分類評估的計量如下:The following metrics are evaluated for multiclass classification:

  • 微精確度 - 每個範例類別組同樣都會對精確度計量提出貢獻。Micro Accuracy - Every sample-class pair contributes equally to the accuracy metric. 建議讓微精確度盡量接近 1。You want Micro Accuracy to be as close to 1 as possible.

  • 大精確度 - 每個類別同樣都會對精確度計量提出貢獻。Macro Accuracy - Every class contributes equally to the accuracy metric. 少數類別會加上與較大類別相同的權重。Minority classes are given equal weight as the larger classes. 建議讓大精確度盡量接近 1。You want Macro Accuracy to be as close to 1 as possible.

  • 記錄檔遺失 - 請參閱記錄檔遺失Log-loss - see Log Loss. 建議讓記錄檔遺失盡量接近零。You want Log-loss to be as close to zero as possible.

  • 記錄檔遺失減少 - 範圍介於 [-inf, 100],其中 100 表示完美的預測,而 0 表示平均預測。Log-loss reduction - Ranges from [-inf, 100], where 100 is perfect predictions and 0 indicates mean predictions. 建議讓記錄檔遺失減少盡量接近零。You want Log-loss reduction to be as close to zero as possible.

顯示模型驗證的計量Displaying the metrics for model validation

使用下列程式碼來顯示計量、共用結果,然後依結果採取動作:Use the following code to display the metrics, share the results, and then act on them:

Console.WriteLine($"*************************************************************************************************************");
Console.WriteLine($"*       Metrics for Multi-class Classification model - Test Data     ");
Console.WriteLine($"*------------------------------------------------------------------------------------------------------------");
Console.WriteLine($"*       MicroAccuracy:    {testMetrics.MicroAccuracy:0.###}");
Console.WriteLine($"*       MacroAccuracy:    {testMetrics.MacroAccuracy:0.###}");
Console.WriteLine($"*       LogLoss:          {testMetrics.LogLoss:#.###}");
Console.WriteLine($"*       LogLossReduction: {testMetrics.LogLossReduction:#.###}");
Console.WriteLine($"*************************************************************************************************************");

部署及使用模型進行預測Deploy and Predict with a model

請使用下列程式碼,在緊接著 Evaluate 方法呼叫底下,從 Main 方法新增對新方法的呼叫:Add a call to the new method from the Main method, right under the Evaluate method call, using the following code:

PredictIssue();

請使用下列程式碼,緊接在 Evaluate 方法之後 (並緊接在 SaveModelAsFile 方法之前),建立 PredictIssue 方法:Create the PredictIssue method, just after the Evaluate method (and just before the SaveModelAsFile method), using the following code:

private static void PredictIssue()
{

}

PredictIssue 方法會執行下列工作:The PredictIssue method executes the following tasks:

  • 建立測試資料的單一問題。Creates a single issue of test data.
  • 根據測試資料預測區域。Predicts Area based on test data.
  • 合併測試資料和預測來進行報告。Combines test data and predictions for reporting.
  • 顯示預測的結果。Displays the predicted results.

透過建立 GitHubIssue 的執行個體,在 Predict 方法中新增 GitHub 問題,以測試所定型模型的預測:Add a GitHub issue to test the trained model's prediction in the Predict method by creating an instance of GitHubIssue:

GitHubIssue singleIssue = new GitHubIssue() { Title = "Entity Framework crashes", Description = "When connecting to the database, EF is crashing" };

如同您先前進行的操作,請使用下列程式碼建立 PredictionEngine 執行個體:As you did previously, create a PredictionEngine instance with the following code:

_predEngine = _mlContext.Model.CreatePredictionEngine<GitHubIssue, IssuePrediction>(loadedModel);

使用 PredictionEngine 來透過將下列程式碼新增到預測的 PredictIssue 方法,來預測 Area GitHub 標籤:Use the PredictionEngine to predict the Area GitHub label by adding the following code to the PredictIssue method for the prediction:

var prediction = _predEngine.Predict(singleIssue);

使用載入的模型來進行預測Using the loaded model for prediction

顯示 Area 以分類問題,並根據該分類採取動作。Display Area in order to categorize the issue and act on it accordingly. 請使用下列 Console.WriteLine() 程式碼來為結果建立顯示:Create a display for the results using the following Console.WriteLine() code:

Console.WriteLine($"=============== Single Prediction - Result: {prediction.Area} ===============");

結果Results

您的結果應該與以下類似。Your results should be similar to the following. 當管線進行處理時,會顯示訊息。As the pipeline processes, it displays messages. 您可能會看到警告或處理訊息。You may see warnings, or processing messages. 為了讓結果變得清楚,這些訊息已從下列結果中移除。These messages have been removed from the following results for clarity.

=============== Single Prediction just-trained-model - Result: area-System.Net ===============
*************************************************************************************************************
*       Metrics for Multi-class Classification model - Test Data
*------------------------------------------------------------------------------------------------------------
*       MicroAccuracy:    0.738
*       MacroAccuracy:    0.668
*       LogLoss:          .919
*       LogLossReduction: .643
*************************************************************************************************************
=============== Single Prediction - Result: area-System.Data ===============

恭喜您!Congratulations! 您現在已成功建置可對 GitHub 問題分類和預測 Area 標籤的機器學習模型。You've now successfully built a machine learning model for classifying and predicting an Area label for a GitHub issue. 您可以在 dotnet/samples 存放庫中找到本教學課程的原始程式碼。You can find the source code for this tutorial at the dotnet/samples repository.

後續步驟Next steps

在本教學課程中,您將了解如何:In this tutorial, you learned how to:

  • 準備您的資料Prepare your data
  • 轉換資料Transform the data
  • 將模型定型Train the model
  • 評估模型Evaluate the model
  • 使用訓練過的模型預測Predict with the trained model
  • 使用已載入的模型部署和預測Deploy and Predict with a loaded model

前進到下一個教學課程來深入了解Advance to the next tutorial to learn more