教學課程：在 ML.NET 中使用預先定型的 TensorFlow 模型分析電影評論的情感

發行項
05/10/2023

本教學課程說明如何使用預先定型的 TensorFlow 模型來分類網站批註中的情感。二進位情感分類器是使用 Visual Studio 開發的 C# 主控台應用程式。

本教學課程中使用的 TensorFlow 模型是使用 IMDB 資料庫中的電影評論進行定型。完成應用程式開發之後，您將能夠提供電影評論文字，而應用程式會告訴您評論是否具有正面或負面情感。

在本教學課程中，您會了解如何：

載入預先定型的 TensorFlow 模型
將網站註解文字轉換成適合模型的功能
使用模型來進行預測

您可以在 dotnet/samples 存放庫中找到本教學課程的原始程式碼。

必要條件

已安裝「.NET Desktop Development」工作負載的Visual Studio 2022。

設定

建立應用程式

建立名為「TextClassificationTF」的 C# 主控台應用程式 。按 [下一步] 按鈕。
選擇 .NET 6 作為要使用的架構。按一下 [ 建立 ] 按鈕。
在專案中建立一個名為 Data 的目錄以儲存資料集檔案。
安裝「Microsoft.ML NuGet 套件」：

注意

除非另有說明，否則此樣本會使用所提及 NuGet 封裝的最新穩定版本。

在 [方案總管] 中，於您的專案上按一下滑鼠右鍵，然後選取 [管理 NuGet 套件]。選擇 [nuget.org] 作為套件來源，然後選取 [ 流覽 ] 索引標籤。搜尋 Microsoft.ML，選取您想要的套件，然後選取 [ 安裝 ] 按鈕。同意您所選套件的授權條款，以繼續進行安裝。針對 Microsoft.ML.TensorFlow、 Microsoft.ML.SampleUtils 和 SciSharp.TensorFlow.Redist重複這些步驟。

將 TensorFlow 模型新增至專案

注意

本教學課程的模型來自 dotnet/machinelearning-testdata GitHub 存放庫。此模型為 TensorFlow SavedModel 格式。

下載 sentiment_model zip 檔案，然後解壓縮。

zip 檔案包含：
- saved_model.pb：TensorFlow 模型本身。模型會採用固定長度 (大小 600) 整數陣列的特徵陣列，代表 IMDB 檢閱字串中的文字，並輸出兩個機率，其總和為 1：輸入檢閱具有正面情感的機率，以及輸入檢閱具有負面情感的機率。
- imdb_word_index.csv：從個別字組對應到整數值。對應是用來產生 TensorFlow 模型的輸入特徵。
將最內層 sentiment_model 目錄的內容複寫到 TextClassificationTF 專案 sentiment_model 目錄中。此目錄包含本教學課程所需的模型和其他支援檔案，如下圖所示：
在 [方案總管] 中，以滑鼠右鍵按一下目錄和子目錄中的每個檔案 sentiment_model ，然後選取 [屬性]。在 [進階] 底下，將 [複製到輸出目錄] 的值變更為 [有更新時才複製]。

新增 using 語句和全域變數

在 Program.cs 檔案頂端新增下列額外的 using 陳述式：

using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Transforms;

在 using 語句後面建立全域變數，以保存儲存的模型檔案路徑。
```
string _modelPath = Path.Combine(Environment.CurrentDirectory, "sentiment_model");
```
- _modelPath 是定型模型的檔案路徑。

將資料模型化

電影評論是自由格式文字。您的應用程式會以數個離散階段，將文字轉換成模型預期的輸入格式。

第一個是將文字分割成不同的單字，並使用提供的對應檔案將每個字對應到整數編碼。此轉換的結果是可變長度整數陣列，其長度對應至句子中的字數。

屬性	值	類型
ReviewText	這部電影真的不錯	字串
VariableLengthFeatures	14,22,9,66,78,...	int[]

然後，可變長度功能陣列的大小會調整為固定長度 600。這是 TensorFlow 模型預期的長度。

屬性	值	類型
ReviewText	這部電影真的不錯	字串
VariableLengthFeatures	14,22,9,66,78,...	int[]
功能	14,22,9,66,78,...	int[600]

為 Program.cs 檔案底部的輸入資料建立類別：

/// <summary>
/// Class to hold original sentiment data.
/// </summary>
public class MovieReview
{
    public string? ReviewText { get; set; }
}

輸入資料類別 MovieReview 具有 string 使用者批註 (ReviewText) 。

在類別之後 MovieReview 建立可變長度功能的類別：

/// <summary>
/// Class to hold the variable length feature vector. Used to define the
/// column names used as input to the custom mapping action.
/// </summary>
public class VariableLength
{
    /// <summary>
    /// This is a variable length vector designated by VectorType attribute.
    /// Variable length vectors are produced by applying operations such as 'TokenizeWords' on strings
    /// resulting in vectors of tokens of variable lengths.
    /// </summary>
    [VectorType]
    public int[]? VariableLengthFeatures { get; set; }
}

屬性 VariableLengthFeatures 具有 VectorType 屬性，可將其指定為向量。所有向量元素都必須是相同的類型。在具有大量資料行的資料集中，將多個資料行載入為單一向量，可減少套用資料轉換時所傳遞的資料數目。

此類別用於 ResizeFeatures 動作中。在此案例中，其屬性的名稱 (只有一個) 用來指出 DataView 中哪些資料行可作為自訂對應動作的輸入。

在類別之後 VariableLength ，建立固定長度功能的類別：

/// <summary>
/// Class to hold the fixed length feature vector. Used to define the
/// column names used as output from the custom mapping action,
/// </summary>
public class FixedLength
{
    /// <summary>
    /// This is a fixed length vector designated by VectorType attribute.
    /// </summary>
    [VectorType(Config.FeatureLength)]
    public int[]? Features { get; set; }
}

此類別用於 ResizeFeatures 動作中。在此案例中，其屬性的名稱 (只有一個) 用來指出 DataView 中的資料行可作為自訂對應動作的輸出。

請注意，屬性 Features 的名稱是由 TensorFlow 模型所決定。您無法變更此屬性名稱。

在類別之後建立預測的 FixedLength 類別：
```
/// <summary>
/// Class to contain the output values from the transformation.
/// </summary>
public class MovieReviewSentimentPrediction
{
    [VectorType(2)]
    public float[]? Prediction { get; set; }
}
```
MovieReviewSentimentPrediction 是在模型定型後所使用的預測類別。 MovieReviewSentimentPrediction 具有單 float 一陣列 (Prediction) 和 VectorType 屬性。
建立另一個類別來保存組態值，例如特徵向量長度：
```
static class Config
{
    public const int FeatureLength = 600;
}
```

建立 MLCoNtext、查閱字典和動作以調整功能大小

MLContext 類別是所有 ML.NET 作業的起點。將 mlContext 初始化會建立新的 ML.NET 環境，可在模型建立工作流程物件間共用。就概念而言，類似於 Entity Framework 中的 DBContext。

將行 Console.WriteLine("Hello World!") 取代為下列程式碼，以宣告和初始化 mlCoNtext 變數：
```
MLContext mlContext = new MLContext();
```

使用 LoadFromTextFile 方法從檔案載入對應資料，建立字典，以將字組編碼為整數，如下表所示：

Word	索引
kids	362
want	181
錯	355
effects	302
感覺	547

新增下列程式碼以建立查閱對應：

var lookupMap = mlContext.Data.LoadFromTextFile(Path.Combine(_modelPath, "imdb_word_index.csv"),
    columns: new[]
        {
            new TextLoader.Column("Words", DataKind.String, 0),
            new TextLoader.Column("Ids", DataKind.Int32, 1),
        },
    separatorChar: ','
    );

Action新增以將可變長度字組整數陣列調整大小為固定大小的整數陣列，並加上下一行程式碼：

Action<VariableLength, FixedLength> ResizeFeaturesAction = (s, f) =>
{
    var features = s.VariableLengthFeatures;
    Array.Resize(ref features, Config.FeatureLength);
    f.Features = features;
};

載入預先定型的 TensorFlow 模型

新增程式碼以載入 TensorFlow 模型：

TensorFlowModel tensorFlowModel = mlContext.Model.LoadTensorFlowModel(_modelPath);

載入模型之後，您可以擷取其輸入和輸出架構。架構只會針對興趣和學習而顯示。您不需要此程式碼，最終應用程式才能運作：

DataViewSchema schema = tensorFlowModel.GetModelSchema();
Console.WriteLine(" =============== TensorFlow Model Schema =============== ");
var featuresType = (VectorDataViewType)schema["Features"].Type;
Console.WriteLine($"Name: Features, Type: {featuresType.ItemType.RawType}, Size: ({featuresType.Dimensions[0]})");
var predictionType = (VectorDataViewType)schema["Prediction/Softmax"].Type;
Console.WriteLine($"Name: Prediction/Softmax, Type: {predictionType.ItemType.RawType}, Size: ({predictionType.Dimensions[0]})");

輸入架構是整數編碼字組的固定長度陣列。輸出架構是一個浮點數的機率陣列，指出檢閱的情感是負數，還是正面的。這些值總和為 1，因為正數的機率是情感為負數的機率補數。

建立 ML.NET 管線

建立管線，並使用 TokenizeIntoWords 轉換將輸入文字分割成單字，以將文字分成下一行程式碼：
```
IEstimator<ITransformer> pipeline =
    // Split the text into individual words
    mlContext.Transforms.Text.TokenizeIntoWords("TokenizedWords", "ReviewText")
```
TokenizeIntoWords轉換會使用空格將文字/字串剖析成單字。它會建立新的資料行，並根據使用者定義的分隔符號，將每個輸入字串分割成子字串的向量。

使用您在上面宣告的查閱表格，將單字對應至其整數編碼：

// Map each word to an integer value. The array of integer makes up the input features.
.Append(mlContext.Transforms.Conversion.MapValue("VariableLengthFeatures", lookupMap,
    lookupMap.Schema["Words"], lookupMap.Schema["Ids"], "TokenizedWords"))

將可變長度整數編碼大小調整為模型所需的固定長度：

// Resize variable length vector to fixed length vector.
.Append(mlContext.Transforms.CustomMapping(ResizeFeaturesAction, "Resize"))

使用載入的 TensorFlow 模型來分類輸入：
```
// Passes the data to TensorFlow for scoring
.Append(tensorFlowModel.ScoreTensorFlowModel("Prediction/Softmax", "Features"))
```
TensorFlow 模型輸出稱為 Prediction/Softmax 。請注意，名稱 Prediction/Softmax 是由 TensorFlow 模型所決定。您無法變更此名稱。
建立輸出預測的新資料行：
```
// Retrieves the 'Prediction' from TensorFlow and copies to a column
.Append(mlContext.Transforms.CopyColumns("Prediction", "Prediction/Softmax"));
```
您必須將資料 Prediction/Softmax 行複製到一個名稱中，該名稱可以當做 C# 類別中的屬性使用： Prediction 。 /C# 屬性名稱中不允許字元。

從管線建立 ML.NET 模型

新增程式碼以從管線建立模型：
```
// Create an executable model from the estimator pipeline
IDataView dataView = mlContext.Data.LoadFromEnumerable(new List<MovieReview>());
ITransformer model = pipeline.Fit(dataView);
```
藉由呼叫 Fit 方法，從管線中的估算器鏈結建立 ML.NET 模型。在此情況下，我們不會調整任何資料來建立模型，因為 TensorFlow 模型先前已經過定型。我們提供空的資料檢視物件，以滿足方法的需求 Fit 。

使用模型來進行預測

PredictSentiment在類別上方 MovieReview 新增方法：

void PredictSentiment(MLContext mlContext, ITransformer model)
{

}

新增下列程式碼，以建立 PredictionEngine 作為方法中的 PredictSentiment() 第一行：
```
var engine = mlContext.Model.CreatePredictionEngine<MovieReview, MovieReviewSentimentPrediction>(model);
```
PredictionEngine 是一種便利的 API，可讓您在單一資料執行個體上接著執行預測。 PredictionEngine 不是安全執行緒。可接受在單一執行緒或原型環境中使用。為了提升效能和執行緒安全性，請使用 PredictionEnginePool 服務，以建立 PredictionEngine 物件的 ObjectPool 供整個應用程式使用。請參閱本指南，以瞭解如何在ASP.NET Core Web API 中使用 PredictionEnginePool。

注意

PredictionEnginePool 服務延伸模組目前處於預覽狀態。
透過建立 MovieReview 的執行個體，在 Predict() 方法中新增評論，以測試定型模型的預測：
```
var review = new MovieReview()
{
    ReviewText = "this film is really good"
};
```
藉由在方法中 PredictSentiment() 新增下一行程式碼，將測試批註資料傳遞至 Prediction Engine ：
```
var sentimentPrediction = engine.Predict(review);
```
Predict () 函式會對單一資料列進行預測：

屬性值類型

預測 [0.5459937, 0.454006255] float[]

屬性	值	類型
預測	[0.5459937, 0.454006255]	float[]

使用下列程式碼顯示情感預測：

Console.WriteLine($"Number of classes: {sentimentPrediction.Prediction?.Length}");
Console.WriteLine($"Is sentiment/review positive? {(sentimentPrediction.Prediction?[1] > 0.5 ? "Yes." : "No.")}");

在呼叫方法之後，將呼叫 Fit() 新增至 PredictSentiment ：
```
PredictSentiment(mlContext, model);
```

結果

建置並執行應用程式。

您的結果應該與以下類似。處理期間會顯示訊息。您可能會看到警告或處理訊息。為了讓結果變得清楚，這些訊息已從下列結果中移除。

Number of classes: 2
Is sentiment/review positive ? Yes

恭喜！您現在已成功建置機器學習模型，以重複使用預先定型的模型，以在 ML.NET 中重複使用預先定 TensorFlow 型的模型，以分類和預測訊息情感。