教學課程:搭配 ML.NET 使用矩陣因式分解建置電影推薦工具Tutorial: Build a movie recommender using matrix factorizaton with ML.NET

本教學課程會示範如何在 .NET Core 主控台應用程式中使用 ML.NET 建置電影推薦工具。This tutorial shows you how to build a movie recommender with ML.NET in a .NET Core console application. 這些步驟會使用 C# 和 Visual Studio 2019。The steps use C# and Visual Studio 2019.

在本教學課程中,您將了解如何:In this tutorial, you learn how to:

  • 選取機器學習演算法Select a machine learning algorithm
  • 準備及載入您的資料Prepare and load your data
  • 建置及定型模型Build and train a model
  • 評估模型Evaluate a model
  • 部署及取用模型Deploy and consume a model

您可以在 dotnet/samples 存放庫中找到本教學課程的原始程式碼。You can find the source code for this tutorial at the dotnet/samples repository.

機器學習工作流程Machine learning workflow

您將使用下列步驟來完成您的工作以及任何其他 ML.NET 工作:You will use the following steps to accomplish your task, as well as any other ML.NET task:

  1. 載入您的資料Load your data
  2. 建置及定型您的模型Build and train your model
  3. 評估您的模型Evaluate your model
  4. 使用您的模型Use your model

必要條件Prerequisites

選取適當的機器學習工作Select the appropriate machine learning task

有數種方式可以解決推薦問題,例如推薦電影清單或推薦相關產品清單;但在此情況下,您將預測使用者會給予該電影的評等 (1-5),如果特定電影的預測評等高於所定義閾值,即推薦該電影 (評等愈高,使用者喜歡特定電影的可能性愈高)。There are several ways to approach recommendation problems, such as recommending a list of movies or recommending a list of related products, but in this case you will predict what rating (1-5) a user will give to a particular movie and recommend that movie if it's higher than a defined threshold (the higher the rating, the higher the likelihood of a user liking a particular movie).

建立主控台應用程式Create a console application

建立專案Create a project

  1. 開啟 Visual Studio 2017。Open Visual Studio 2017. 從功能表列中選取 [檔案] > [新增] > [專案] 。Select File > New > Project from the menu bar. 在 [新增專案] 對話方塊中,選取 [Visual C#] 節點,然後選取 [.NET Core] 節點。In the New Project dialog, select the Visual C# node followed by the .NET Core node. 然後選取 [主控台應用程式 (.NET Core)] 專案範本。Then select the Console App (.NET Core) project template. 在 [名稱] 文字方塊中,鍵入 "MovieRecommender",然後選取 [確定] 按鈕。In the Name text box, type "MovieRecommender" and then select the OK button.

  2. 在您的專案中建立一個名為 Data 的目錄以儲存資料集:Create a directory named Data in your project to store the data set:

    在 [方案總管] 中,以滑鼠右鍵按一下專案,然後選取 [新增] > [新增資料夾] 。In Solution Explorer, right-click the project and select Add > New Folder. 輸入 "Data",然後按 Enter。Type "Data" and hit Enter.

  3. 安裝 Microsoft.MLMicrosoft.ML.Recommender NuGet 套件:Install the Microsoft.ML and Microsoft.ML.Recommender NuGet Packages:

    在 [方案總管] 中,以滑鼠右鍵按一下專案,然後選取 [管理 NuGet 套件] 。In Solution Explorer, right-click the project and select Manage NuGet Packages. 選擇 "nuget.org" 作為 [套件來源]、選取 [瀏覽] 索引標籤、搜尋 Microsoft.ML、從清單中選取該套件,然後選取 [安裝] 按鈕。Choose "nuget.org" as the Package source, select the Browse tab, search for Microsoft.ML, select the package in the list, and select the Install button. 在 [預覽變更] 對話方塊上,選取 [確定] 按鈕,然後在 [授權接受] 對話方塊上,如果您同意所列套件的授權條款,請選取 [我接受] 。Select the OK button on the Preview Changes dialog and then select the I Accept button on the License Acceptance dialog if you agree with the license terms for the packages listed. Microsoft.ML.Recommender 重複這些步驟。Repeat these steps for Microsoft.ML.Recommender.

  4. 在您的 Program.cs 檔案最上方新增下列 using 陳述式:Add the following using statements at the top of your Program.cs file:

    using System;
    using System.IO;
    using Microsoft.ML;
    using Microsoft.ML.Trainers;
    

下載您的資料Download your data

  1. 下載兩個資料集,並儲存至您先前建立的 Data 資料夾:Download the two datasets and save them to the Data folder you previously created:

    • 以滑鼠右鍵按一下 recommendation-ratings-train.csv,然後選取 [另存連結 (或目標)...]Right click on recommendation-ratings-train.csv and select "Save Link (or Target) As..."

    • 以滑鼠右鍵按一下 recommendation-ratings-test.csv,然後選取 [另存連結 (或目標)...]Right click on recommendation-ratings-test.csv and select "Save Link (or Target) As..."

      請務必將 *.csv 檔案儲存至 Data 資料夾,或儲存在其他位置之後將 *.csv 檔案移至 Data 資料夾。Make sure you either save the *.csv files to the Data folder, or after you save it elsewhere, move the *.csv files to the Data folder.

  2. 在 [方案總管] 中,於每個 *.csv 檔案上按一下滑鼠右鍵,然後選取 [屬性] 。In Solution Explorer, right-click each of the *.csv files and select Properties. 在 [進階] 底下,將 [複製到輸出目錄] 的值變更為 [有更新時才複製] 。Under Advanced, change the value of Copy to Output Directory to Copy if newer.

    在 VS 中有更新版本時複製

載入您的資料Load your data

ML.NET 程序的第一個步驟是準備並載入模型定型和測試資料。The first step in the ML.NET process is to prepare and load your model training and testing data.

推薦評等資料會分成 TrainTest 資料集。The recommendation ratings data is split into Train and Test datasets. Train 資料用來調整您的模型。The Train data is used to fit your model. Test 資料用來以您的已定型模型進行預測並評估模型效能。The Test data is used to make predictions with your trained model and evaluate model performance. TrainTest 資料通常會分割為 80/20 比例。It's common to have an 80/20 split with Train and Test data.

以下是您 *.csv 檔案的資料預覽:Below is a preview of the data from your *.csv files:

資料預覽

在 *.csv 檔案中有四個資料行:In the *.csv files, there are four columns:

  • userId
  • movieId
  • rating
  • timestamp

在機器學習服務中,用來進行預測的資料行稱為功能,而傳回預測的資料行稱為標籤In machine learning, the columns that are used to make a prediction are called Features, and the column with the returned prediction is called the Label.

您希望預測電影評等,因此評等資料行是 LabelYou want to predict movie ratings, so the rating column is the Label. 其他三個資料行 userIdmovieIdtimestamp 都是 Features,用來預測 LabelThe other three columns, userId, movieId, and timestamp are all Features used to predict the Label.

功能Features 標籤Label
userId rating
movieId
timestamp

由您決定使用哪些 Features 來預測 LabelIt's up to you to decide which Features are used to predict the Label. 您也可以使用類似功能排列重要性的功能,來協助您選取最合適的 FeaturesYou can also use methods like Feature Permutation Importance to help with selecting the best Features.

在此情況下,您應該排除 timestamp 資料行為 Feature,因為時間戳記並不會實際影響使用者對特定影片的評分方式,因此無法提供更精確的預測:In this case, you should eliminate the timestamp column as a Feature because the timestamp does not really affect how a user rates a given movie and thus would not contribute to making a more accurate prediction:

功能Features 標籤Label
userId rating
movieId

接下來,您必須定義輸入類別的資料結構。Next you must define your data structure for the input class.

將新類別新增至專案:Add a new class to your project:

  1. 在 [方案總管] 中,以滑鼠右鍵按一下專案,然後選取 [新增] > [新項目] 。In Solution Explorer, right-click the project, and then select Add > New Item.

  2. 在 [新增項目] 對話方塊中,選取 [類別] ,然後將 [名稱] 欄位變更為 MovieRatingData.csIn the Add New Item dialog box, select Class and change the Name field to MovieRatingData.cs. 接著,選取 [新增] 按鈕。Then, select the Add button.

MovieRatingData.cs 檔案隨即在程式碼編輯器中開啟。The MovieRatingData.cs file opens in the code editor. 將下列 using 陳述式新增至 MovieRatingData.cs 的最上方:Add the following using statement to the top of MovieRatingData.cs:

using Microsoft.ML.Data;

移除現有類別定義,並在 MovieRatingData.cs 中新增下列程式碼,來建立稱為 MovieRating 的類別:Create a class called MovieRating by removing the existing class definition and adding the following code in MovieRatingData.cs:

public class MovieRating
{
    [LoadColumn(0)]
    public float userId;
    [LoadColumn(1)]
    public float movieId;
    [LoadColumn(2)]
    public float Label;
}

MovieRating 會指定輸入資料類別。MovieRating specifies an input data class. LoadColumn 屬性會指定應該載入資料集內的哪些資料行 (依資料行索引)。The LoadColumn attribute specifies which columns (by column index) in the dataset should be loaded. userIdmovieId 資料行是您的 Features (您將給予模型來預測 Label 的輸入),而評等資料行是您將預測的 Label (模型的輸出)。The userId and movieId columns are your Features (the inputs you will give the model to predict the Label), and the rating column is the Label that you will predict (the output of the model).

建立另一個類別 MovieRatingPrediction,藉由在 MovieRatingData.cs 中的 MovieRating 類別之後新增下列程式碼,以代表預測的結果:Create another class, MovieRatingPrediction, to represent predicted results by adding the following code after the MovieRating class in MovieRatingData.cs:

public class MovieRatingPrediction
{
    public float Label;
    public float Score;
}

Program.cs 中,以 Main() 內的下列程式碼取代 Console.WriteLine("Hello World!")In Program.cs, replace the Console.WriteLine("Hello World!") with the following code inside Main():

MLContext mlContext = new MLContext();

MLContext 類別是所有 ML.NET 作業的起點,且初始化 mlContext 會建立新的 ML.NET 環境,其可在模型建立工作流程物件之間共用。The MLContext class is a starting point for all ML.NET operations, and initializing mlContext creates a new ML.NET environment that can be shared across the model creation workflow objects. 就概念而言,類似於 Entity Framework 中的 DBContextIt's similar, conceptually, to DBContext in Entity Framework.

Main() 之後,建立稱為 LoadData() 的方法:After Main(), create a method called LoadData():

public static (IDataView training, IDataView test) LoadData(MLContext mlContext)
{

}

注意

此方法將給出錯誤,直到您在下列步驟中新增 return 陳述式。This method will give you an error until you add a return statement in the following steps.

初始化您的資料路徑變數、從 *.csv 檔案載入資料,並將下列程式碼新增為 LoadData() 中的下一行程式碼來傳回 TrainTest 資料作為 IDataView 物件:Initialize your data path variables, load the data from the *.csv files, and return the Train and Test data as IDataView objects by adding the following as the next line of code in LoadData():

var trainingDataPath = Path.Combine(Environment.CurrentDirectory, "Data", "recommendation-ratings-train.csv");
var testDataPath = Path.Combine(Environment.CurrentDirectory, "Data", "recommendation-ratings-test.csv");

IDataView trainingDataView = mlContext.Data.LoadFromTextFile<MovieRating>(trainingDataPath, hasHeader: true, separatorChar: ',');
IDataView testDataView = mlContext.Data.LoadFromTextFile<MovieRating>(testDataPath, hasHeader: true, separatorChar: ',');

return (trainingDataView, testDataView);

ML.NET 中的資料以 IDataView 類別 表示。Data in ML.NET is represented as an IDataView class. IDataView 是彈性且有效率的表格式資料描述方式 (數值和文字)。IDataView is a flexible, efficient way of describing tabular data (numeric and text). 資料可以從文字或即時 (例如 SQL 資料庫或記錄檔) 載入至 IDataView 物件。Data can be loaded from a text file or in real time (for example, SQL database or log files) to an IDataView object.

LoadFromTextFile() 會定義資料結構描述並讀入檔案中。The LoadFromTextFile() defines the data schema and reads in the file. 會接受資料路徑變數然後傳回 IDataViewIt takes in the data path variables and returns an IDataView. 在此情況下,您提供 TestTrain 檔案的路徑,並指示文字檔案標頭 (以便其正確使用資料行名稱) 和逗號字元資料分隔符號 (預設的分隔符號是索引標籤)。In this case, you provide the path for your Test and Train files and indicate both the text file header (so it can use the column names properly) and the comma character data separator (the default separator is a tab).

將下列內容新增為 Main() 方法中的下兩行程式碼以呼叫 LoadData() 方法並傳回 TrainTest 資料:Add the following as the next two lines of code in the Main() method to call your LoadData() method and return the Train and Test data:

IDataView trainingDataView = LoadData(mlContext).training;
IDataView testDataView = LoadData(mlContext).test;

建置及定型您的模型Build and train your model

ML.NET 有三個主要概念:資料轉換器以及估算工具There are three major concepts in ML.NET: Data, Transformers, and Estimators.

機器學習服務定型演算法需要特定格式的資料。Machine learning training algorithms require data in a certain format. Transformers 用來將表格式資料轉換成相容的格式。Transformers are used to transform tabular data to a compatible format.

轉換器映像

您會建立 Estimators 以在 ML.NET 中建立 TransformersYou create Transformers in ML.NET by creating Estimators. Estimators 會接受資料並傳回 TransformersEstimators take in data and return Transformers.

估算工具映像

您將用於定型模型的推薦定型演算法,即為 Estimator 的範例。The recommendation training algorithm you will use for training your model is an example of an Estimator.

以下列步驟來建置 EstimatorBuild an Estimator with the following steps:

請使用下列程式碼,在緊接著 LoadData() 方法之後,建立 BuildAndTrainModel() 方法:Create the BuildAndTrainModel() method, just after the LoadData() method, using the following code:

public static ITransformer BuildAndTrainModel(MLContext mlContext, IDataView trainingDataView)
{

}

注意

此方法將給出錯誤,直到您在下列步驟中新增 return 陳述式。This method will give you an error until you add a return statement in the following steps.

將下列程式碼新增至 BuildAndTrainModel() 以定義資料轉換:Define the data transformations by adding the following code to BuildAndTrainModel():

IEstimator<ITransformer> estimator = mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: "userIdEncoded", inputColumnName: "userId")
    .Append(mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: "movieIdEncoded", inputColumnName: "movieId"));

由於 userIdmovieId 代表使用者與電影標題,而非真正的值,所以您會使用 MapValueToKey() 方法來將每個 userId 和每個 movieId 轉換成數值索引鍵類型 Feature 資料行 (推薦演算法所接受的格式),並將其新增為新的資料集資料行:Since userId and movieId represent users and movie titles, not real values, you use the MapValueToKey() method to transform each userId and each movieId into a numeric key type Feature column (a format accepted by recommendation algorithms) and add them as new dataset columns:

userIduserId movieIdmovieId 標籤Label userIdEncodeduserIdEncoded movieIdEncodedmovieIdEncoded
11 11 44 userKey1userKey1 movieKey1movieKey1
11 33 44 userKey1userKey1 movieKey2movieKey2
11 66 44 userKey1userKey1 movieKey3movieKey3

將下列程式碼新增為 BuildAndTrainModel() 中的下一行程式碼,以選擇機器學習演算法,並將其附加至資料轉換定義中:Choose the machine learning algorithm and append it to the data transformation definitions by adding the following as the next line of code in BuildAndTrainModel():

var options = new MatrixFactorizationTrainer.Options
{
    MatrixColumnIndexColumnName = "userIdEncoded",
    MatrixRowIndexColumnName = "movieIdEncoded", 
    LabelColumnName = "Label",
    NumberOfIterations = 20,
    ApproximationRank = 100
};

var trainerEstimator = estimator.Append(mlContext.Recommendation().Trainers.MatrixFactorization(options));

MatrixFactorizationTrainer 是您的推薦定型演算法。The MatrixFactorizationTrainer is your recommendation training algorithm. 當您擁有使用者過去如何評等產品的資料時,矩陣分解是推薦的常見方法,此亦為本教學課程資料集的情況。Matrix Factorization is a common approach to recommendation when you have data on how users have rated products in the past, which is the case for the datasets in this tutorial. 當您有不同的可用資料時,也有其他推薦演算法 (請參閱其他推薦演算法一節以深入了解)。There are other recommendation algorithms for when you have different data available (see the Other recommendation algorithms section below to learn more).

在此案例中,Matrix Factorization 演算法使用的方法稱為「共同篩選」,此方法假設如果使用者 1 與使用者 2 對特定問題具有相同的意見,則使用者 1 對其他問題的想法較可能與使用者 2 相同。In this case, the Matrix Factorization algorithm uses a method called "collaborative filtering", which assumes that if User 1 has the same opinion as User 2 on a certain issue, then User 1 is more likely to feel the same way as User 2 about a different issue.

比方說,如果使用者 1 對電影的評分與使用者 2 類似,則使用者 2 較可能享受使用者 1 已觀看並給予高度評分的電影:For instance, if User 1 and User 2 rate movies similarly, then User 2 is more likely to enjoy a movie that User 1 has watched and rated highly:

Incredibles 2 (2018) The Avengers (2012) Guardians of the Galaxy (2014)
使用者 1User 1 已觀看及已按讚的電影Watched and liked movie 已觀看及已按讚的電影Watched and liked movie 已觀看及已按讚的電影Watched and liked movie
使用者 2User 2 已觀看及已按讚的電影Watched and liked movie 已觀看及已按讚的電影Watched and liked movie 尚未觀看 -- 推薦電影Has not watched -- RECOMMEND movie

Matrix Factorization 定型器具有數個選項,您可以在以下演算法超參數一節中深入了解。The Matrix Factorization trainer has several Options, which you can read more about in the Algorithm hyperparameters section below.

將下列內容新增為 BuildAndTrainModel() 方法中的下一行程式碼,調整模型為合適於 Train 資料並傳回已定型模型:Fit the model to the Train data and return the trained model by adding the following as the next line of code in the BuildAndTrainModel() method:

Console.WriteLine("=============== Training the model ===============");
ITransformer model = trainerEstimator.Fit(trainingDataView);

return model;

Fit() 方法會以所提供的定型資料集來定型模型。The Fit() method trains your model with the provided training dataset. 技術上來說,其會轉換資料並套用定型來執行 Estimator 定義,並傳回已定型模型,也就是 TransformerTechnically, it executes the Estimator definitions by transforming the data and applying the training, and it returns back the trained model, which is a Transformer.

將下列內容新增為 Main() 方法中的下一行程式碼以呼叫 BuildAndTrainModel() 方法,並傳回已定型模型:Add the following as the next line of code in the Main() method to call your BuildAndTrainModel() method and return the trained model:

ITransformer model = BuildAndTrainModel(mlContext, trainingDataView);

評估您的模型Evaluate your model

一旦您將模型定型後,即可將測試資料用於評估模型的執行情況。Once you have trained your model, use your test data to evaluate how your model is performing.

請使用下列程式碼,在緊接著 BuildAndTrainModel() 方法之後,建立 EvaluateModel() 方法:Create the EvaluateModel() method, just after the BuildAndTrainModel() method, using the following code:

public static void EvaluateModel(MLContext mlContext, IDataView testDataView, ITransformer model)
{

}

將下列程式碼新增至 EvaluateModel() 以轉換 Test 資料:[!code-csharpTransform]Transform the Test data by adding the following code to EvaluateModel(): [!code-csharpTransform]

Transform() 方法會對測試資料集之多個提供的輸入資料列進行預測。The Transform() method makes predictions for multiple provided input rows of a test dataset.

將下列內容新增為 EvaluateModel() 方法中的下一行程式碼來評估模型:Evaluate the model by adding the following as the next line of code in the EvaluateModel() method:

var metrics = mlContext.Regression.Evaluate(prediction, labelColumnName: "Label", scoreColumnName: "Score");

在您設定好預測後,Evaluate() 方法會評估模型,將預測值與測試資料集中的實際 Labels 進行比較,並傳回模型的執行情況。Once you have the prediction set, the Evaluate() method assesses the model, which compares the predicted values with the actual Labels in the test dataset and returns metrics on how the model is performing.

將下列內容新增為 EvaluateModel() 方法中的下一行程式碼,將您的評估計量列印到主控台:Print your evaluation metrics to the console by adding the following as the next line of code in the EvaluateModel() method:

Console.WriteLine("Root Mean Squared Error : " + metrics.RootMeanSquaredError.ToString());
Console.WriteLine("RSquared: " + metrics.RSquared.ToString());

將下列內容新增為 Main() 方法中的下一行程式碼,來呼叫您的 EvaluateModel() 方法:Add the following as the next line of code in the Main() method to call your EvaluateModel() method:

EvaluateModel(mlContext, testDataView, model);

到目前為止,輸出看起來應類似下列文字:The output so far should look similar to the following text:

=============== Training the model ===============
iter      tr_rmse          obj
   0       1.5403   3.1262e+05
   1       0.9221   1.6030e+05
   2       0.8687   1.5046e+05
   3       0.8416   1.4584e+05
   4       0.8142   1.4209e+05
   5       0.7849   1.3907e+05
   6       0.7544   1.3594e+05
   7       0.7266   1.3361e+05
   8       0.6987   1.3110e+05
   9       0.6751   1.2948e+05
  10       0.6530   1.2766e+05
  11       0.6350   1.2644e+05
  12       0.6197   1.2541e+05
  13       0.6067   1.2470e+05
  14       0.5953   1.2382e+05
  15       0.5871   1.2342e+05
  16       0.5781   1.2279e+05
  17       0.5713   1.2240e+05
  18       0.5660   1.2230e+05
  19       0.5592   1.2179e+05
=============== Evaluating the model ===============
Rms: 0.994051469730769
RSquared: 0.412556298844873

在此輸出中,有 20 個反覆項目。In this output, there are 20 iterations. 在每個反覆項目中,錯誤的量值會減少並逐漸接近 0。In each iteration, the measure of error decreases and converges closer and closer to 0.

root of mean squared error (RMS 或 RMSE) 被用來測量模型預測值與測試資料集觀察值之間的差異。The root of mean squared error (RMS or RMSE) is used to measure the differences between the model predicted values and the test dataset observed values. 技術上來說,其為誤差平方之平均值的平方根。Technically it's the square root of the average of the squares of the errors. 此計量值越低,模型就越好。The lower it is, the better the model is.

R Squared 表示資料符合模型的程度。R Squared indicates how well data fits a model. 範圍為 0 到 1。Ranges from 0 to 1. 值為 0 時,表示資料是隨機的,也就是與模型不相符。A value of 0 means that the data is random or otherwise can't be fit to the model. 值為 1 時,表示模型與資料完全相符。A value of 1 means that the model exactly matches the data. R Squared 分數愈接近 1 愈好。You want your R Squared score to be as close to 1 as possible.

建立成功的模型是一個需要反覆嘗試的程序。Building successful models is an iterative process. 此模型一開始的品質較低,因為此教學課程是使用小型的資料集來提供快速的模型定型。This model has initial lower quality as the tutorial uses small datasets to provide quick model training. 如果您對於模型的品質感到不滿意,可以嘗試為它提供較大的定型資料集,或選擇不同的定型演算法,並針對每個演算法搭配不同的超參數來改善它。If you aren't satisfied with the model quality, you can try to improve it by providing larger training datasets or by choosing different training algorithms with different hyper-parameters for each algorithm. 如需詳細資訊,請參閱下面的改善您的模型一節。For more information, check out the Improve your model section below.

使用您的模型Use your model

現在您可以使用您的已定型模型對新資料進行預測。Now you can use your trained model to make predictions on new data.

請使用下列程式碼,在緊接著 EvaluateModel() 方法之後,建立 UseModelForSinglePrediction() 方法:Create the UseModelForSinglePrediction() method, just after the EvaluateModel() method, using the following code:

public static void UseModelForSinglePrediction(MLContext mlContext, ITransformer model)
{

}

將下列程式碼新增至 UseModelForSinglePrediction(),使用 PredictionEngine 來預測評等:Use the PredictionEngine to predict the rating by adding the following code to UseModelForSinglePrediction():

Console.WriteLine("=============== Making a prediction ===============");
var predictionEngine = mlContext.Model.CreatePredictionEngine<MovieRating, MovieRatingPrediction>(model);

PredictionEngine 類別是很方便的 API,可讓您傳遞資料的單一執行個體,然後在此資料的單一執行個體上執行預測。The PredictionEngine class is a convenience API, which allows you to pass a single instance of data and then perform a prediction on this single instance of data.

建立稱為 testInputMovieRating 執行個體,並將下列內容新增為 UseModelForSinglePrediction() 方法中的後續程式碼來將其傳遞至預測引擎:Create an instance of MovieRating called testInput and pass it to the Prediction Engine by adding the following as the next lines of code in the UseModelForSinglePrediction() method:

var testInput = new MovieRating { userId = 6, movieId = 10 };

var movieRatingPrediction = predictionEngine.Predict(testInput);

Predict() 函式會在資料的單一資料行進行預測。The Predict() function makes a prediction on a single column of data.

您可以接著使用 Score 或預測的評等,來判斷您是否想要將電影 movieId 10 推薦給使用者 6。You can then use the Score, or the predicted rating, to determine whether you want to recommend the movie with movieId 10 to user 6. Score 愈高,使用者喜好特定電影的可能性愈高。The higher the Score, the higher the likelihood of a user liking a particular movie. 在此案例中,假設您推薦預測評等 > 3.5 的電影。In this case, let’s say that you recommend movies with a predicted rating of > 3.5.

若要列印結果,請將下列內容新增為 UseModelForSinglePrediction() 方法中的後續程式碼:To print the results, add the following as the next lines of code in the UseModelForSinglePrediction() method:

if (Math.Round(movieRatingPrediction.Score, 1) > 3.5)
{
    Console.WriteLine("Movie " + testInput.movieId + " is recommended for user " + testInput.userId);
}
else
{
    Console.WriteLine("Movie " + testInput.movieId + " is not recommended for user " + testInput.userId);
}

將下列內容新增為 Main() 方法中的下一行程式碼,來呼叫您的 UseModelForSinglePrediction() 方法:Add the following as the next line of code in the Main() method to call your UseModelForSinglePrediction() method:

UseModelForSinglePrediction(mlContext, model);

此方法的輸出看起來應類似下列文字:The output of this method should look similar to the following text:

=============== Making a prediction ===============
Movie 10 is recommended for user 6

儲存您的模型Save your model

若要使用您的模型在終端使用者應用程式中進行預測,您必須先儲存模型。To use your model to make predictions in end-user applications, you must first save the model.

請使用下列程式碼,在緊接著 UseModelForSinglePrediction() 方法之後,建立 SaveModel() 方法:Create the SaveModel() method, just after the UseModelForSinglePrediction() method, using the following code:

public static void SaveModel(MLContext mlContext, DataViewSchema trainingDataViewSchema, ITransformer model)
{

}

SaveModel() 方法中新增下列程式碼來儲存您的已定型模型:Save your trained model by adding the following code in the SaveModel() method:

var modelPath = Path.Combine(Environment.CurrentDirectory, "Data", "MovieRecommenderModel.zip");

Console.WriteLine("=============== Saving the model to a file ===============");
mlContext.Model.Save(model, trainingDataViewSchema, modelPath);

此方法會將已定型模型儲存至 .zip 檔案 (在 "Data" 資料夾中),其之後可用於其他 .NET 應用程式中來進行預測。This method saves your trained model to a .zip file (in the "Data" folder), which can then be used in other .NET applications to make predictions.

將下列內容新增為 Main() 方法中的下一行程式碼,來呼叫您的 SaveModel() 方法:Add the following as the next line of code in the Main() method to call your SaveModel() method:

SaveModel(mlContext, trainingDataView.Schema, model);

使用您已儲存的模型Use your saved model

一旦您儲存已定型模型,您即可以在不同的環境中取用模型 (請參閱《使用說明指南》以了解如何在應用程式中讓已定型的機器學習模型能夠運作)。Once you have saved your trained model, you can consume the model in different environments (see the "How-to guide" to learn how to operationalize a trained machine learning model in apps).

結果Results

完成上述步驟後,請執行主控台應用程式 (Ctrl + F5)。After following the steps above, run your console app (Ctrl + F5). 上述單一預測的結果應該如下所示。Your results from the single prediction above should be similar to the following. 您可能會看到警告或處理中訊息,但為了讓結果變得清楚,這些訊息已從下列結果中移除。You may see warnings or processing messages, but these messages have been removed from the following results for clarity.

=============== Training the model ===============
iter      tr_rmse          obj
   0       1.5382   3.1213e+05
   1       0.9223   1.6051e+05
   2       0.8691   1.5050e+05
   3       0.8413   1.4576e+05
   4       0.8145   1.4208e+05
   5       0.7848   1.3895e+05
   6       0.7552   1.3613e+05
   7       0.7259   1.3357e+05
   8       0.6987   1.3121e+05
   9       0.6747   1.2949e+05
  10       0.6533   1.2766e+05
  11       0.6353   1.2636e+05
  12       0.6209   1.2561e+05
  13       0.6072   1.2462e+05
  14       0.5965   1.2394e+05
  15       0.5868   1.2352e+05
  16       0.5782   1.2279e+05
  17       0.5713   1.2227e+05
  18       0.5637   1.2190e+05
  19       0.5604   1.2178e+05
=============== Evaluating the model ===============
Rms: 0.977175077487166
RSquared: 0.43233349213192
=============== Making a prediction ===============
Movie 10 is recommended for user 6
=============== Saving the model to a file ===============

恭喜您!Congratulations! 您現在已成功建置推薦電影的機器學習服務模型。You've now successfully built a machine learning model for recommending movies. 您可以在 dotnet/samples 存放庫中找到本教學課程的原始程式碼。You can find the source code for this tutorial at the dotnet/samples repository.

改善您的模型Improve your model

有幾種方式可讓您改善模型的效能,以便您進行更精確的預測。There are several ways that you can improve the performance of your model so that you can get more accurate predictions.

資料Data

為每位使用者和影片識別碼新增更多具有足夠樣本的已定型資料,有助於改善推薦模型的品質。Adding more training data that has enough samples for each user and movie id can help improve the quality of the recommendation model.

交叉驗證是一種評估模型技術,會將資料隨機分割成子集 (而不是像本教學課程從資料集擷取出測試資料),並採用部分群組作為定型資料,以及部分群組作為測試資料。Cross validation is a technique for evaluating models that randomly splits up data into subsets (instead of extracting out test data from the dataset like you did in this tutorial) and takes some of the groups as train data and some of the groups as test data. 此方法在模型品質方面的表現優於定型/測試分割。This method outperforms making a train-test split in terms of model quality.

功能Features

在本教學課程中,您只會使用資料集所提供的三個 Features (user idmovie idrating)。In this tutorial, you only use the three Features (user id, movie id, and rating) that are provided by the dataset.

雖然這是不錯的起點,但在實際操作時,建議您新增其他屬性或 Features (例如年齡、性別、地理位置等),如果這些也包含在資料集內。While this is a good start, in reality you might want to add other attributes or Features (for example, age, gender, geo-location, etc.) if they are included in the dataset. 新增更多相關 Features 有助於改善推薦模型的效能。Adding more relevant Features can help improve the performance of your recommendation model.

如果您不確定哪些 Features 可能與您的機器學習工作最相關,您也可以利用功能比重計算 (FCC) 和功能排列重要性,讓 ML.NET 探索最具影響力的 FeaturesIf you are unsure about which Features might be the most relevant for your machine learning task, you can also make use of Feature Contribution Calculation (FCC) and Feature Permutation Importance, which ML.NET provides to discover the most influential Features.

演算法超參數Algorithm hyperparameters

雖然 ML.NET 提供最佳的預設定型演算法,但您也可以變更演算法的超參數來進一步微調效能。While ML.NET provides good default training algorithms, you can further fine-tune performance by changing the algorithm's hyperparameters.

針對 Matrix Factorization,您可以使用 NumberOfIterationsApproximationRank 等超參數,查看其是否可提供您更好的結果。For Matrix Factorization, you can experiment with hyperparameters such as NumberOfIterations and ApproximationRank to see if that gives you better results.

例如,本教學課程中的演算法選項如下:For instance, in this tutorial the algorithm options are:

var options = new MatrixFactorizationTrainer.Options
{
    MatrixColumnIndexColumnName = "userIdEncoded",
    MatrixRowIndexColumnName = "movieIdEncoded",
    LabelColumnName = "Label",
    NumberOfIterations = 20,
    ApproximationRank = 100
};

其他建議演算法Other Recommendation Algorithms

具備共同篩選的矩陣分解演算法,僅為執行電影推薦的其中一種方法。The matrix factorization algorithm with collaborative filtering is only one approach for performing movie recommendations. 在許多情況下,您可能會沒有可用的評等資料,並只有使用者的電影觀看記錄。In many cases, you may not have the ratings data available and only have movie history available from users. 而在其他情況下,您擁有的資料可能不只是使用者評等資料。In other cases, you may have more than just the user’s rating data.

演算法Algorithm 情節Scenario 範例Sample
單一類別矩陣分解One Class Matrix Factorization 當您只需要 userId 和 movieId 時,請使用此選項。Use this when you only have userId and movieId. 此推薦類型乃根據共同採購案例或經常同時購買的產品,也就是會根據客戶自己的採購訂單記錄向客戶推薦一組產品。This style of recommendation is based upon the co-purchase scenario, or products frequently bought together, which means it will recommend to customers a set of products based upon their own purchase order history. > 現在就試試看>Try it out
欄位感知分解機器Field Aware Factorization Machines 當您所擁有的功能多於 userId、productId 和評等 (如產品描述或產品價格) 時,請使用此選項來進行推薦。Use this to make recommendations when you have more Features beyond userId, productId, and rating (such as product description or product price). 此方法也會使用共同作業篩選方法。This method also uses a collaborative filtering approach. > 現在就試試看>Try it out

新使用者案例New user scenario

共同篩選的一個常見問題是冷啟動問題,也就是當您有一個不具有先前資料的新使用者,以至於無法進行推斷時。One common problem in collaborative filtering is the cold start problem, which is when you have a new user with no previous data to draw inferences from. 此問題的解決方法通常是要求新使用者建立設定檔,並對他們先前看過的電影評分 (舉例來說)。This problem is often solved by asking new users to create a profile and, for instance, rate movies they have seen in the past. 雖然此方法會對使用者造成一些負擔,但可以為沒有評等記錄的新使用者提供一些起始資料。While this method puts some burden on the user, it provides some starting data for new users with no rating history.

資源Resources

本教學課程中使用的資料衍生自 MovieLens 資料集The data used in this tutorial is derived from MovieLens Dataset.

後續步驟Next steps

在本教學課程中,您將了解如何:In this tutorial, you learned how to:

  • 選取機器學習演算法Select a machine learning algorithm
  • 準備及載入您的資料Prepare and load your data
  • 建置及定型模型Build and train a model
  • 評估模型Evaluate a model
  • 部署及取用模型Deploy and consume a model

前進到下一個教學課程來深入了解Advance to the next tutorial to learn more