教程:使用 ML.NET 模型生成器在 Web 应用程序中分析网站评论的情绪Tutorial: Analyze sentiment of website comments in a web application using ML.NET Model Builder

了解如何在 Web 应用程序中实时分析评论中的情绪。Learn how to analyze sentiment from comments in real-time inside a web application.

本教程演示如何创建 ASP.NET Core Razor Pages 应用程序,该应用程序实时对网站评论情绪进行分类并采取适当的措施。This tutorial shows you how to create an ASP.NET Core Razor Pages application that classifies sentiment from website comments in real-time.

在本教程中,你将了解:In this tutorial, you learn how to:

  • 创建 ASP.NET Core Razor Pages 应用程序Create an ASP.NET Core Razor Pages application
  • 准备和了解数据Prepare and understand the data
  • 选择方案Choose a scenario
  • 加载数据Load the data
  • 定型模型Train the model
  • 评估模型Evaluate the model
  • 使用预测模型Use the model for predictions

备注

模型生成器当前为预览版。Model Builder is currently in Preview.

可以在 dotnet/machinelearning-samples 存储库中找到本教程的源代码。You can find the source code for this tutorial at the dotnet/machinelearning-samples repository.

先决条件Pre-requisites

请访问模型生成器安装指南,查看先决条件和安装说明的列表。For a list of pre-requisites and installation instructions, visit the Model Builder installation guide.

创建 Razor Pages 应用程序Create a Razor Pages application

  1. 创建 ASP.NET Core Razor Pages 应用程序 。Create a ASP.NET Core Razor Pages Application.

    1. 打开 Visual Studio 并从菜单栏中选择“文件”>“新建”>“项目” 。Open Visual Studio and select File > New > Project from the menu bar.
    2. 在“新项目”对话框中,依次选择“Visual C#” 节点和“Web” 节点。In the New Project dialog, select the Visual C# node followed by the Web node.
    3. 然后,选择“ASP.NET Core Web 应用程序” 项目模板。Then select the ASP.NET Core Web Application project template.
    4. 在“名称” 文本框中,键入“SentimentRazor”。In the Name text box, type "SentimentRazor".
    5. 请确保未选中“将解决方案和项目放置在同一目录中”(VS 2019) 或已选中“创建解决方案的目录”(VS 2017) 。Make sure Place solution and project in the same directory is unchecked (VS 2019), or Create directory for solution is checked (VS 2017).
    6. 选择“确定” 按钮。Select the OK button.
    7. 在显示不同类型 ASP.NET Core 项目的窗口中,选择“Web 应用程序” ,然后选择“确定” 按钮。Choose Web Application in the window that displays the different types of ASP.NET Core Projects, and then select the OK button.

准备和了解数据Prepare and understand the data

下载维基百科 detox 数据集Download Wikipedia detox dataset. 打开网页时,右键单击页面,选择“另存为” ,将文件保存到计算机上的任何位置。When the webpage opens, right-click on the page, select Save As and save the file anywhere on your computer.

wikipedia-detox-250-line-data.tsv 数据集中的每一行都代表一个用户在维基百科上留下的不同评价。Each row in the wikipedia-detox-250-line-data.tsv dataset represents a different review left by a user on Wikipedia. 第一列表示文本的情绪(0 表示 non-toxic (无害),1 表示 toxic (不良)),第二列表示用户留下的评论。The first column represents the sentiment of the text (0 is non-toxic, 1 is toxic), and the second column represents the comment left by the user. 列由制表符分隔。The columns are separated by tabs. 数据类似如下所示:The data looks like the following:

情绪Sentiment 情绪文本SentimentText
11 ==RUDE== Dude, you are rude upload that carl picture back, or else.==RUDE== Dude, you are rude upload that carl picture back, or else.
11 == OK!== OK! == IM GOING TO VANDALIZE WILD ONES WIKI THEN!!!== IM GOING TO VANDALIZE WILD ONES WIKI THEN!!!
00 我希望这会有所帮助。I hope this helps.

选择方案Choose a scenario

Visual Studio 中的“模型生成器”向导

为了训练模型,需要从模型生成器提供的可用机器学习方案列表中进行选择。To train your model, you need to select from the list of available machine learning scenarios provided by Model Builder.

  1. 在“解决方案资源管理器”中,右键单击“SentimentRazor”项目,然后选择“添加” > “机器学习” 。In Solution Explorer, right-click the SentimentRazor project, and select Add > Machine Learning.
  2. 对于本示例,方案为情绪分析。For this sample, the scenario is sentiment analysis. 在模型生成器工具的“方案”步骤中,选择“情绪分析”方案 。In the scenario step of the Model Builder tool, select the Sentiment Analysis scenario.

加载数据Load the data

模型生成器接受来自两个源的数据:SQL Server 数据库或者 csvtsv 格式的本地文件。Model Builder accepts data from two sources, a SQL Server database or a local file in csv or tsv format.

  1. 在模型生成器工具的数据步骤中,选择数据源下拉列表中的“文件” 。In the data step of the Model Builder tool, select File from the data source dropdown.
  2. 选择“选择文件”文本框旁边的按钮,并使用文件资源管理器浏览到 wikipedia-detox-250-line-data.tsv 文件,然后选择该文件 。Select the button next to the Select a file text box and use File Explorer to browse and select the wikipedia-detox-250-line-data.tsv file.
  3. 在“要预测的列(标签)”下拉列表中选择“情绪” 。Choose Sentiment in the Column to Predict (Label) dropdown.
  4. 保留“输入列(特性)”下拉列表的默认值。 Leave the default values for the Input Columns (Features) dropdown.
  5. 选择“训练” 链接以转到模型生成器工具中的下一步。Select the Train link to move to the next step in the Model Builder tool.

定型模型Train the model

在本教程中,用于训练情绪分析模型的机器学习任务是二元分类。The machine learning task used to train the sentiment analysis model in this tutorial is binary classification. 在模型训练过程中,模型生成器使用不同的二元分类算法和设置训练各个模型,以便为数据集找到性能最佳的模型。During the model training process, Model Builder trains separate models using different binary classification algorithms and settings to find the best performing model for your dataset.

模型训练所需的时间与数据量成正比。The time required for the model to train is proportionate to the amount of data. 模型生成器会根据数据源的大小自动选择“训练时间(秒)”的默认值 。Model Builder automatically selects a default value for Time to train (seconds) based on the size of your data source.

  1. 尽管模型生成器将“训练时间(秒)” 的值设置为 10 秒,但可以将其增加到 30 秒。Although Model Builder sets the value of Time to train (seconds) to 10 seconds, increase it to 30 seconds. 通过较长时间段的训练,模型生成器可以在最佳模型的搜索中浏览更多的算法和参数组合。Training for a longer period of time allows Model Builder to explore a larger number of algorithms and combination of parameters in search of the best model.

  2. 选择“开始训练” 。Select Start Training.

    在训练过程中,进度数据显示在训练步骤中的 Progress 部分。Throughout the training process, progress data is displayed in the Progress section of the train step.

    • “状态”显示训练进程的完成状态。Status displays the completion status of the training process.
    • “最高准确性”显示截至目前由模型生成器找到的性能最佳的模型的准确性。Best accuracy displays the accuracy of the best performing model found by Model Builder so far. 准确性越高,意味着模型对测试数据的预测越准确。Higher accuracy means the model predicted more correctly on test data.
    • “最佳算法”显示截至目前由模型生成器找到的性能最佳的算法的名称。Best algorithm displays the name of the best performing algorithm performed found by Model Builder so far.
    • “最新算法”显示模型生成器为了训练模型采用的最新算法名称。Last algorithm displays the name of the algorithm most recently used by Model Builder to train the model.
  3. 训练完成后,选择“评估” 链接以转到下一步。Once training is complete, select the evaluate link to move to the next step.

评估模型Evaluate the model

训练步骤的成果将是一个模型,该模型具备最佳的性能。The result of the training step will be one model which had the best performance. 在模型生成器工具的评估步骤中,输出部分将包含“最佳模型”项中性能最佳模型使用的算法,并包含“最佳模型准确度”中的指标 。In the evaluate step of the Model Builder tool, the output section, will contain the algorithm used by the best performing model in the Best Model entry along with metrics in Best Model Accuracy. 此外还有一个摘要表格,包含性能最佳的前五种模型以及它们的指标信息。Additionally, a summary table containing top five models and their metrics.

如果对自己的准确性指标不满意,尝试提高模型准确性的简单方法是增加模型的训练时间或使用更多数据。If you're not satisfied with your accuracy metrics, some easy ways to try and improve model accuracy are to increase the amount of time to train the model or use more data. 否则,选择“代码” 链接以转到模型生成器工具中的最后一步。Otherwise, select the code link to move to the final step in the Model Builder tool.

添加代码进行预测Add the code to make predictions

训练期间会创建两个项目。Two projects will be created as a result of the training process.

引用已训练的模型Reference the trained model

  1. 在模型生成器工具的“代码” 步骤中,选择“添加项目” 将自动生成的项目添加到解决方案。In the code step of the Model Builder tool, select Add Projects to add the autogenerated projects to the solution.

    以下项目应出现在解决方案资源管理器 中:The following projects should appear in the Solution Explorer:

    • SentimentRazorML.ConsoleApp:包含模型训练和预测代码的 .NET Core 控制台应用程序。SentimentRazorML.ConsoleApp: A .NET Core Console application that contains the model training and prediction code.
    • SentimentRazorML.Model:一个 .NET Standard 类库,包含定义输入和输出模型数据架构的数据模型,以及在训练期间性能最佳的模型的保存版本。SentimentRazorML.Model: A .NET Standard class library containing the data models that define the schema of input and output model data as well as the saved version of the best performing model during training.

    对于本教程,只使用 SentimentRazorML.Model 项目,因为预测将在 SentimentRazor Web 应用程序中(而不是控制台中)进行。For this tutorial, only the SentimentRazorML.Model project is used because predictions will be made in the SentimentRazor web application rather than in the console. 尽管 SentimentRazorML.ConsoleApp 不用于评分,但它可用于在以后使用新数据重新训练模型。Although the SentimentRazorML.ConsoleApp won't be used for scoring, it can be used to retrain the model using new data at a later time. 重新训练不属于本教程的讨论范围。Retraining is outside the scope of this tutorial though.

配置 PredictionEngine 池Configure the PredictionEngine pool

若要进行单一预测,必须创建 PredictionEngineTo make a single prediction, you have to create a PredictionEngine. PredictionEngine 不是线程安全型。PredictionEngine is not thread-safe. 此外,必须在应用程序中的每一处所需位置创建它的实例。Additionally, you have to create an instance of it everywhere it is needed within your application. 随着应用程序的增长,此过程可能会变得难以管理。As your application grows, this process can become unmanageable. 为了提高性能和线程安全,请结合使用依赖项注入和 PredictionEnginePool 服务,这将创建一个在整个应用程序中使用的 PredictionEngine 对象的 ObjectPoolFor improved performance and thread safety, use a combination of dependency injection and the PredictionEnginePool service, which creates an ObjectPool of PredictionEngine objects for use throughout your application.

  1. 安装 Microsoft.Extensions.ML NuGet 包 :Install the Microsoft.Extensions.ML NuGet package:

    1. 在“解决方案资源管理器”中,右键单击项目,然后选择“管理 NuGet 包” 。In Solution Explorer, right-click the project and select Manage NuGet Packages.
    2. 选择“nuget.org”作为“包源”。Choose "nuget.org" as the Package source.
    3. 选择“浏览” 选项卡,搜索“Microsoft.Extensions.ML” 。Select the Browse tab and search for Microsoft.Extensions.ML.
    4. 在列表中选择包,然后选择“安装” 按钮。Select the package in the list, and select the Install button.
    5. 选择“预览更改” 对话框中的“确定” 按钮Select the OK button on the Preview Changes dialog
    6. 如果同意所列包的许可条款,请选择“接受许可” 对话框中的“我接受” 按钮。Select the I Accept button on the License Acceptance dialog if you agree with the license terms for the packages listed.
  2. 打开“SentimentRazor” 项目中的“Startup.cs” 文件。Open the Startup.cs file in the SentimentRazor project.

  3. 添加以下 using 语句以引用 Microsoft.Extensions.ML NuGet 包和 SentimentRazorML.Model 项目:Add the following using statements to reference the Microsoft.Extensions.ML NuGet package and SentimentRazorML.Model project:

    using System.IO;
    using Microsoft.Extensions.ML;
    using SentimentRazorML.Model;
    
  4. 创建全局变量以存储已训练模型文件的位置。Create a global variable to store the location of the trained model file.

    private readonly string _modelPath;
    
  5. 模型文件与应用程序的程序集文件一起存储在生成目录中。The model file is stored in the build directory alongside the assembly files of your application. 为了便于访问,在 Configure 方法后创建一个称为 GetAbsolutePath 的帮助程序方法To make it easier to access, create a helper method called GetAbsolutePath after the Configure method

    public static string GetAbsolutePath(string relativePath)
    {
        FileInfo _dataRoot = new FileInfo(typeof(Program).Assembly.Location);
        string assemblyFolderPath = _dataRoot.Directory.FullName;
    
        string fullPath = Path.Combine(assemblyFolderPath, relativePath);
        return fullPath;
    }
    
  6. 使用 Startup 类构造函数中的 GetAbsolutePath 方法来设置 _modelPathUse the GetAbsolutePath method in the Startup class constructor to set the _modelPath.

    _modelPath = GetAbsolutePath("MLModel.zip");
    
  7. ConfigureServices 方法中为应用程序配置 PredictionEnginePoolConfigure the PredictionEnginePool for your application in the ConfigureServices method:

    services.AddPredictionEnginePool<ModelInput, ModelOutput>()
            .FromFile(_modelPath);
    

创建情绪分析处理程序Create sentiment analysis handler

预测将在应用程序的主页内进行。Predictions will be made inside the main page of the application. 因此,需要添加一个采用用户输入并使用 PredictionEnginePool 来返回预测的方法。Therefore, a method that takes the user input and uses the PredictionEnginePool to return a prediction needs to be added.

  1. 打开位于 Pages 目录中的 Index.cshtml.cs 文件,并添加以下 using 语句:Open the Index.cshtml.cs file located in the Pages directory and add the following using statements:

    using Microsoft.Extensions.ML;
    using SentimentRazorML.Model;
    

    要使用 Startup 类中已配置的 PredictionEnginePool,必须将它插入到要在其中使用它的模型的构造函数中。In order to use the PredictionEnginePool configured in the Startup class, you have to inject it into the constructor of the model where you want to use it.

  2. 添加变量以引用 IndexModel 类中的 PredictionEnginePoolAdd a variable to reference the PredictionEnginePool inside the IndexModel class.

    private readonly PredictionEnginePool<ModelInput, ModelOutput> _predictionEnginePool;
    
  3. IndexModel 类中创建一个构造函数,并将 PredictionEnginePool 服务注入到其中。Create a constructor in the IndexModel class and inject the PredictionEnginePool service into it.

    public IndexModel(PredictionEnginePool<ModelInput, ModelOutput> predictionEnginePool)
    {
        _predictionEnginePool = predictionEnginePool;
    }
    
  4. 创建一个方法处理程序,该处理程序使用 PredictionEnginePool 来对从网页接收的用户输入进行预测。Create a method handler that uses the PredictionEnginePool to make predictions from user input received from the web page.

    1. OnGet 方法下面,创建一个名为 OnGetAnalyzeSentiment 的新方法Below the OnGet method, create a new method called OnGetAnalyzeSentiment

      public IActionResult OnGetAnalyzeSentiment([FromQuery] string text)
      {
      
      }
      
    2. OnGetAnalyzeSentiment 方法中,如果用户输入为空或为 null,则返回“中性”情绪 。Inside the OnGetAnalyzeSentiment method, return Neutral sentiment if the input from the user is blank or null.

      if (String.IsNullOrEmpty(text)) return Content("Neutral");
      
    3. 给定有效的输入,创建新的 ModelInput 实例。Given a valid input, create a new instance of ModelInput.

      var input = new ModelInput { SentimentText = text };
      
    4. 使用 PredictionEnginePool 预测情绪。Use the PredictionEnginePool to predict sentiment.

      var prediction = _predictionEnginePool.Predict(input);
      
    5. 使用以下代码将预测的 bool 值转换为“toxic”或“not toxic”。Convert the predicted bool value into toxic or not toxic with the following code.

      var sentiment = Convert.ToBoolean(prediction.Prediction) ? "Toxic" : "Not Toxic";
      
    6. 最后,将情绪返回到网页上。Finally, return the sentiment back to the web page.

      return Content(sentiment);
      

配置网页Configure the web page

OnGetAnalyzeSentiment 返回的结果将在 Index 网页上动态显示。The results returned by the OnGetAnalyzeSentiment will be dynamically displayed on the Index web page.

  1. 在“Pages” 目录中打开 Index.cshtml 文件,并将其内容替换为以下代码:Open the Index.cshtml file in the Pages directory and replace its contents with the following code:

    @page
    @model IndexModel
    @{
        ViewData["Title"] = "Home page";
    }
    
    <div class="text-center">
        <h2>Live Sentiment</h2>
    
        <p><textarea id="Message" cols="45" placeholder="Type any text like a short review"></textarea></p>
    
        <div class="sentiment">
            <h4>Your sentiment is...</h4>
            <p>😡 😐 😍</p>
    
            <div class="marker">
                <div id="markerPosition" style="left: 45%;">
                    <div>▲</div>
                    <label id="markerValue">Neutral</label>
                </div>
            </div>
        </div>
    </div>
    
  2. 接下来,将 css 样式代码添加到 wwwroot\css 目录中的 site.css 页面的末尾:Next, add css styling code to the end of the site.css page in the wwwroot\css directory:

    /* Style for sentiment display */
    
    .sentiment {
        background-color: #eee;
        position: relative;
        display: inline-block;
        padding: 1rem;
        padding-bottom: 0;
        border-radius: 1rem;
    }
    
    .sentiment h4 {
        font-size: 16px;
        text-align: center;
        margin: 0;
        padding: 0;
    }
    
    .sentiment p {
        font-size: 50px;
    }
    
    .sentiment .marker {
        position: relative;
        left: 22px;
        width: calc(100% - 68px);
    }
    
    .sentiment .marker > div {
        transition: 0.3s ease-in-out;
        position: absolute;
        margin-left: -30px;
        text-align: center;
    }
    
    .sentiment .marker > div > div {
        font-size: 50px;
        line-height: 20px;
        color: green;
    }
    
    .sentiment .marker > div label {
        font-size: 30px;
        color: gray;
    }
    
  3. 然后,添加代码,将源自网页的输入发送到 OnGetAnalyzeSentiment 处理程序。After that, add code to send inputs from the web page to the OnGetAnalyzeSentiment handler.

    1. 在 wwwroot\js 目录中的 site.js 文件中,创建名为 getSentiment 的函数,以便向 OnGetAnalyzeSentiment 处理程序的用户输入发出 GET HTTP 请求。In the site.js file located in the wwwroot\js directory, create a function called getSentiment to make a GET HTTP request with the user input to the OnGetAnalyzeSentiment handler.

      function getSentiment(userInput) {
          return fetch(`Index?handler=AnalyzeSentiment&text=${userInput}`)
              .then((response) => {
                  return response.text();
              })
      }
      
    2. 在它的下面,添加另一个名为 updateMarker 的函数,以便在预测情绪时,动态更新网页上标记的位置。Below that, add another function called updateMarker to dynamically update the position of the marker on the web page as sentiment is predicted.

      function updateMarker(markerPosition, sentiment) {
          $("#markerPosition").attr("style", `left:${markerPosition}%`);
          $("#markerValue").text(sentiment);
      }
      
    3. 创建一个名为 updateSentiment 的事件处理程序函数以获取用户的输入,使用 getSentiment 函数将其发送到 OnGetAnalyzeSentiment 函数,并使用 updateMarker 函数更新标记。Create an event handler function called updateSentiment to get the input from the user, send it to the OnGetAnalyzeSentiment function using the getSentiment function and update the marker with the updateMarker function.

      function updateSentiment() {
      
          var userInput = $("#Message").val();
      
          getSentiment(userInput)
              .then((sentiment) => {
                  switch (sentiment) {
                      case "Not Toxic":
                          updateMarker(100.0,sentiment);
                          break;
                      case "Toxic":
                          updateMarker(0.0,sentiment);
                          break;
                      default:
                          updateMarker(45.0, "Neutral");
                  }
              });
      }
      
    4. 最后,注册事件处理程序并将其绑定到具有 id=Message 特性的 textarea 元素。Finally, register the event handler and bind it to the textarea element with the id=Message attribute.

      $("#Message").on('change input paste', updateSentiment)
      

运行此应用程序Run the application

既然应用程序已设置,请运行应在浏览器中启动的应用程序。Now that your application is set up, run the application which should launch in your browser.

当应用程序启动时,在文本区域中输入“Model Builder is cool!” When the application launches, enter Model Builder is cool! into the text area. 显示的预测情绪应为“Not Toxic” 。The predicted sentiment displayed should be Not Toxic.

正在运行的窗口(包含预测情绪窗口)

如果稍后需要在另一个解决方案中引用模型生成器生成的项目,可以在 C:\Users\%USERNAME%\AppData\Local\Temp\MLVSTools 目录中找到它们。If you need to reference the Model Builder generated projects at a later time inside of another solution, you can find them inside the C:\Users\%USERNAME%\AppData\Local\Temp\MLVSTools directory.

后续步骤Next steps

在本教程中,你将了解:In this tutorial, you learned how to:

  • 创建 ASP.NET Core Razor Pages 应用程序Create an ASP.NET Core Razor Pages application
  • 准备和了解数据Prepare and understand the data
  • 选择方案Choose a scenario
  • 加载数据Load the data
  • 定型模型Train the model
  • 评估模型Evaluate the model
  • 使用预测模型Use the model for predictions

其他资源Additional Resources

若要详细了解本教程中所述的主题,请访问以下资源:To learn more about topics mentioned in this tutorial, visit the following resources: