教程:在 ML.NET 中使用 ONNX 检测对象Tutorial: Detect objects using ONNX in ML.NET

了解如何在 ML.NET 中使用预训练的 ONNX 模型来检测图像中的对象。Learn how to use a pre-trained ONNX model in ML.NET to detect objects in images.

从头开始训练对象检测模型需要设置数百万个参数、大量已标记的训练数据和海量计算资源(数百个 GPU 小时)。Training an object detection model from scratch requires setting millions of parameters, a large amount of labeled training data and a vast amount of compute resources (hundreds of GPU hours). 使用预训练的模型可让你快速完成训练过程。Using a pre-trained model allows you to shortcut the training process.

在本教程中,你将了解:In this tutorial, you learn how to:

  • 了解问题Understand the problem
  • 了解什么是 ONNX 以及它如何与 ML.NET 配合使用Learn what ONNX is and how it works with ML.NET
  • 了解模型Understand the model
  • 重用预训练的模型Reuse the pre-trained model
  • 使用已加载模型检测对象Detect objects with a loaded model

先决条件Pre-requisites

ONNX 对象检测示例概述ONNX object detection sample overview

此示例创建一个 .NET 核心控制台应用程序,该应用程序使用预训练的深度学习 ONNX 模型检测图像中的对象。This sample creates a .NET core console application that detects objects within an image using a pre-trained deep learning ONNX model. 此示例的代码可以在 GitHub 上的 dotnet/machinelearning-samples 存储库找到。The code for this sample can be found on the dotnet/machinelearning-samples repository on GitHub.

什么是对象检测?What is object detection?

对象检测是计算机视觉问题。Object detection is a computer vision problem. 虽然与图像分类密切相关,但是对象检测以更精细的比例执行图像分类。While closely related to image classification, object detection performs image classification at a more granular scale. 对象检测用于定位 图像中的实体并对其进行分类。Object detection both locates and categorizes entities within images. 如果图像包含多个不同类型的对象,请使用对象检测。Use object detection when images contain multiple objects of different types.

显示图像分类与对象分类的屏幕截图。

对象检测的一些用例包括:Some use cases for object detection include:

  • 自动驾驶汽车Self-Driving Cars
  • 机器人Robotics
  • 人脸检测Face Detection
  • 工作区安全性Workplace Safety
  • 对象计数Object Counting
  • 活动识别Activity Recognition

选择深度学习模型Select a deep learning model

深度学习是机器学习的一部分。Deep learning is a subset of machine learning. 若要训练深度学习模型,则需要大量的数据。To train deep learning models, large quantities of data are required. 数据中的模式用一系列层表示。Patterns in the data are represented by a series of layers. 数据中的关系被编码为包含权重的层之间的连接。The relationships in the data are encoded as connections between the layers containing weights. 权重越大,关系越强。The higher the weight, the stronger the relationship. 总的来说,这一系列的层和连接被称为人工神经网络。Collectively, this series of layers and connections are known as artificial neural networks. 网络中的层越多,它就越“深”,使其成为一个深层的神经网络。The more layers in a network, the "deeper" it is, making it a deep neural network.

神经网络有多种类型,最常见的是多层感知器 (MLP)、卷积神经网络 (CNN) 和循环神经网络 (RNN)。There are different types of neural networks, the most common being Multi-Layered Perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). 最基本的是 MLP,可将一组输入映射到一组输出。The most basic is the MLP, which maps a set of inputs to a set of outputs. 如果数据没有空间或时间组件,建议使用这种神经网络。This neural network is good when the data does not have a spatial or time component. CNN 利用卷积层来处理数据中包含的空间信息。The CNN makes use of convolutional layers to process spatial information contained in the data. 图像处理就是 CNN 的一个很好的用例,它检测图像区域中是否存在特征(例如,图像中心是否有鼻子?)A good use case for CNNs is image processing to detect the presence of a feature in a region of an image (for example, is there a nose in the center of an image?). 最后,RNN 允许将状态或内存的持久性用作输入。Finally, RNNs allow for the persistence of state or memory to be used as input. RNN 用于时间序列分析,其中事件的顺序排序和上下文很重要。RNNs are used for time-series analysis, where the sequential ordering and context of events is important.

了解模型Understand the model

对象检测是图像处理任务。Object detection is an image-processing task. 因此,训练解决该问题的大多数深度学习模型都是 CNN。Therefore, most deep learning models trained to solve this problem are CNNs. 本教程中使用的模型是 Tiny YOLOv2 模型,这是该文件中描述的 YOLOv2 模型的一个更紧凑版本:“YOLO9000:更好、更快、更强”,作者:Redmon 和 FadhariThe model used in this tutorial is the Tiny YOLOv2 model, a more compact version of the YOLOv2 model described in the paper: "YOLO9000: Better, Faster, Stronger" by Redmon and Fadhari. Tiny YOLOv2 在 Pascal VOC 数据集上进行训练,共包含 15 层,可预测 20 种不同类别的对象。Tiny YOLOv2 is trained on the Pascal VOC dataset and is made up of 15 layers that can predict 20 different classes of objects. 由于 Tiny YOLOv2 是原始 YOLOv2 模型的精简版本,因此需要在速度和精度之间进行权衡。Because Tiny YOLOv2 is a condensed version of the original YOLOv2 model, a tradeoff is made between speed and accuracy. 构成模型的不同层可以使用 Neutron 等工具进行可视化。The different layers that make up the model can be visualized using tools like Netron. 检查模型将在构成神经网络的所有层之间生成连接映射,其中每个层都将包含层名称以及各自输入/输出的维度。Inspecting the model would yield a mapping of the connections between all the layers that make up the neural network, where each layer would contain the name of the layer along with the dimensions of the respective input / output. 用于描述模型输入和输出的数据结构称为张量。The data structures used to describe the inputs and outputs of the model are known as tensors. 可以将张量视为以 N 维存储数据的容器。Tensors can be thought of as containers that store data in N-dimensions. 对于 Tiny YOLOv2,输入层名称为 image,它需要一个维度为 3 x 416 x 416 的张量。In the case of Tiny YOLOv2, the name of the input layer is image and it expects a tensor of dimensions 3 x 416 x 416. 输出层名称为 grid,且生成维度为 125 x 13 x 13 的输出张量。The name of the output layer is grid and generates an output tensor of dimensions 125 x 13 x 13.

将输入层拆分为隐藏层,然后拆分输出层

YOLO 模型采用图像 3(RGB) x 416px x 416pxThe YOLO model takes an image 3(RGB) x 416px x 416px. 模型接受此输入,并将其传递到不同的层以生成输出。The model takes this input and passes it through the different layers to produce an output. 输出将输入图像划分为一个 13 x 13 网格,网格中的每个单元格由 125 值组成。The output divides the input image into a 13 x 13 grid, with each cell in the grid consisting of 125 values.

什么是 ONNX 模型?What is an ONNX model?

开放神经网络交换 (ONNX) 是 AI 模型的开放源代码格式。The Open Neural Network Exchange (ONNX) is an open source format for AI models. ONNX 支持框架之间的互操作性。ONNX supports interoperability between frameworks. 这意味着,你可以在许多常见的机器学习框架(如 pytorch)中训练模型,将其转换为 ONNX 格式,并在其他框架(如 ML.NET)中使用 ONNX 模型。This means you can train a model in one of the many popular machine learning frameworks like PyTorch, convert it into ONNX format and consume the ONNX model in a different framework like ML.NET. 有关详细信息,请参阅 ONNX 网站To learn more, visit the ONNX website.

所使用的 ONNX 支持格式的关系图。

预训练的 Tiny YOLOv2 模型以 ONNX 格式存储,这是层的序列化表示形式,也是这些层的已知模式。The pre-trained Tiny YOLOv2 model is stored in ONNX format, a serialized representation of the layers and learned patterns of those layers. 在 ML.NET 中,使用 ImageAnalyticsOnnxTransformer NuGet 包实现了与 ONNX 的互操作性。In ML.NET, interoperability with ONNX is achieved with the ImageAnalytics and OnnxTransformer NuGet packages. ImageAnalytics 包包含一系列转换,这些转换采用图像并将其编码为可用作预测或训练管道输入的数值。The ImageAnalytics package contains a series of transforms that take an image and encode it into numerical values that can be used as input into a prediction or training pipeline. OnnxTransformer 包利用 ONNX 运行时加载 ONNX 模型并使用它根据提供的输入进行预测。The OnnxTransformer package leverages the ONNX Runtime to load an ONNX model and use it to make predictions based on input provided.

ONNX 文件到 ONNX 运行时的数据流。

设置 .NET Core 项目Set up the .NET Core project

对 ONNX 的涵义以及 Tiny YOLOv2 的工作原理有了大致了解之后,接下来了解如何生成应用程序。Now that you have a general understanding of what ONNX is and how Tiny YOLOv2 works, it's time to build the application.

创建控制台应用程序Create a console application

  1. 创建名为 ObjectDetection 的 .NET Core 控制台应用程序 。Create a .NET Core Console Application called "ObjectDetection".

  2. 安装“Microsoft.ML NuGet 包” :Install the Microsoft.ML NuGet Package:

    • 在“解决方案资源管理器”中,右键单击项目,然后选择“管理 NuGet 包” 。In Solution Explorer, right-click on your project and select Manage NuGet Packages.
    • 选择“nuget.org”作为“包源”,选择“浏览”选项卡,再搜索“Microsoft.ML” 。Choose "nuget.org" as the Package source, select the Browse tab, search for Microsoft.ML.
    • 选择“安装”按钮 。Select the Install button.
    • 选择“预览更改” 对话框上的“确定” 按钮,如果你同意所列包的许可条款,则选择“接受许可” 对话框上的“我接受” 按钮。Select the OK button on the Preview Changes dialog and then select the I Accept button on the License Acceptance dialog if you agree with the license terms for the packages listed.
    • 对 Microsoft.ML.ImageAnalytics 和 Microsoft.ML.OnnxTransformer 重复这些步骤 。Repeat these steps for Microsoft.ML.ImageAnalytics and Microsoft.ML.OnnxTransformer.

准备你的数据和预训练的模型Prepare your data and pre-trained model

  1. 下载并解压缩项目资产目录 zip 文件Download The project assets directory zip file and unzip.

  2. assets 目录复制到 ObjectDetection 项目目录中 。Copy the assets directory into your ObjectDetection project directory. 此目录及其子目录包含本教程所需的图像文件(Tiny YOLOv2 模型除外,将在下一步中下载并添加此模型)。This directory and its subdirectories contain the image files (except for the Tiny YOLOv2 model, which you'll download and add in the next step) needed for this tutorial.

  3. ONNX Model Zoo 下载并解压缩 Tiny YOLOv2 模型Download the Tiny YOLOv2 model from the ONNX Model Zoo, and unzip.

    打开命令提示符并输入以下命令:Open the command prompt and enter the following command:

    tar -xvzf tiny_yolov2.tar.gz
    
  4. 将提取的 model.onnx 文件从刚刚解压缩的目录复制到 ObjectDetection 项目的 assets\Model 目录中,并将其重命名为 TinyYolo2_model.onnxCopy the extracted model.onnx file from the directory just unzipped into your ObjectDetection project assets\Model directory and rename it to TinyYolo2_model.onnx. 此目录包含本教程所需的模型。This directory contains the model needed for this tutorial.

  5. 在“解决方案资源管理器”中,右键单击资产目录和子目录中的每个文件,再选择“属性” 。In Solution Explorer, right-click each of the files in the asset directory and subdirectories and select Properties. 在“高级”下,将“复制到输出目录”的值更改为“如果较新则复制” 。Under Advanced, change the value of Copy to Output Directory to Copy if newer.

创建类和定义路径Create classes and define paths

打开 Program.cs 文件,并将以下附加的 using 语句添加到该文件顶部 :Open the Program.cs file and add the following additional using statements to the top of the file:

using System;
using System.IO;
using System.Collections.Generic;
using System.Drawing;
using System.Drawing.Drawing2D;
using System.Linq;
using Microsoft.ML;

接下来,定义各种资产的路径。Next, define the paths of the various assets.

  1. 首先,在 Program 类的 Main 方法下面添加 GetAbsolutePath 方法。First, add the GetAbsolutePath method below the Main method in the Program class.

    public static string GetAbsolutePath(string relativePath)
    {
        FileInfo _dataRoot = new FileInfo(typeof(Program).Assembly.Location);
        string assemblyFolderPath = _dataRoot.Directory.FullName;
    
        string fullPath = Path.Combine(assemblyFolderPath, relativePath);
    
        return fullPath;
    }
    
  2. 然后,在 Main 方法内,创建字段以存储资产的位置。Then, inside the Main method, create fields to store the location of your assets.

    var assetsRelativePath = @"../../../assets";
    string assetsPath = GetAbsolutePath(assetsRelativePath);
    var modelFilePath = Path.Combine(assetsPath, "Model", "TinyYolo2_model.onnx");
    var imagesFolder = Path.Combine(assetsPath, "images");
    var outputFolder = Path.Combine(assetsPath, "images", "output");
    

向项目添加新目录以存储输入数据和预测类。Add a new directory to your project to store your input data and prediction classes.

在“解决方案资源管理器”中,右键单击项目,然后选择“添加” > “新文件夹” 。In Solution Explorer, right-click the project, and then select Add > New Folder. 当新文件夹出现在解决方案资源管理器中时,将其命名为“DataStructures”。When the new folder appears in the Solution Explorer, name it "DataStructures".

在新创建的“DataStructures”目录中创建输入数据类 。Create your input data class in the newly created DataStructures directory.

  1. 在“解决方案资源管理器”中,右键单击“DataStructures”目录,然后选择“添加” > “新项” 。In Solution Explorer, right-click the DataStructures directory, and then select Add > New Item.

  2. 在“添加新项”对话框中,选择“类”,并将“名称”字段更改为“ImageNetData.cs” 。In the Add New Item dialog box, select Class and change the Name field to ImageNetData.cs. 然后,选择“添加” 按钮。Then, select the Add button.

    此时,将在代码编辑器中打开 ImageNetData.cs 文件 。The ImageNetData.cs file opens in the code editor. 将下面的 using 语句添加到 ImageNetData.cs 顶部 :Add the following using statement to the top of ImageNetData.cs:

    using System.Collections.Generic;
    using System.IO;
    using System.Linq;
    using Microsoft.ML.Data;
    

    删除现有类定义,并将 ImageNetData 类的以下代码添加到 ImageNetData.cs 文件中 :Remove the existing class definition and add the following code for the ImageNetData class to the ImageNetData.cs file:

    public class ImageNetData
    {
        [LoadColumn(0)]
        public string ImagePath;
    
        [LoadColumn(1)]
        public string Label;
    
        public static IEnumerable<ImageNetData> ReadFromFile(string imageFolder)
        {
            return Directory
                .GetFiles(imageFolder)
                .Where(filePath => Path.GetExtension(filePath) != ".md")
                .Select(filePath => new ImageNetData { ImagePath = filePath, Label = Path.GetFileName(filePath) });
        }
    }
    

    ImageNetData 是输入图像数据类,包含以下 String 字段:ImageNetData is the input image data class and has the following String fields:

    • ImagePath 包含存储图像的路径。ImagePath contains the path where the image is stored.
    • Label 包含文件的名称。Label contains the name of the file.

    此外,ImageNetData 包含方法 ReadFromFile,该方法加载存储在指定的 imageFolder 路径中的多个图像文件,并将它们作为 ImageNetData 对象的集合返回。Additionally, ImageNetData contains a method ReadFromFile that loads multiple image files stored in the imageFolder path specified and returns them as a collection of ImageNetData objects.

在“DataStructures”目录中创建预测类 。Create your prediction class in the DataStructures directory.

  1. 在“解决方案资源管理器”中,右键单击“DataStructures”目录,然后选择“添加” > “新项” 。In Solution Explorer, right-click the DataStructures directory, and then select Add > New Item.

  2. 在“添加新项”对话框中,选择“类”,并将“名称”字段更改为“ImageNetPrediction.cs” 。In the Add New Item dialog box, select Class and change the Name field to ImageNetPrediction.cs. 然后,选择“添加” 按钮。Then, select the Add button.

    此时,将在代码编辑器中打开 ImageNetPrediction.cs 文件 。The ImageNetPrediction.cs file opens in the code editor. 将下面的 using 语句添加到 ImageNetPrediction.cs 顶部 :Add the following using statement to the top of ImageNetPrediction.cs:

    using Microsoft.ML.Data;
    

    删除现有类定义,并将 ImageNetPrediction 类的以下代码添加到 ImageNetPrediction.cs 文件中 :Remove the existing class definition and add the following code for the ImageNetPrediction class to the ImageNetPrediction.cs file:

    public class ImageNetPrediction
    {
        [ColumnName("grid")]
        public float[] PredictedLabels;
    }
    

    ImageNetPrediction 是预测数据类,包含以下 float[] 字段:ImageNetPrediction is the prediction data class and has the following float[] field:

    • PredictedLabel 包含图像中检测到的每个边界框的尺寸、对象分数和类概率。PredictedLabel contains the dimensions, objectness score, and class probabilities for each of the bounding boxes detected in an image.

在 Main 中初始化变量Initialize variables in Main

执行所有 ML.NET 操作都是从 MLContext 类开始,初始化 mlContext 可创建一个新的 ML.NET 环境,可在模型创建工作流对象之间共享该环境。The MLContext class is a starting point for all ML.NET operations, and initializing mlContext creates a new ML.NET environment that can be shared across the model creation workflow objects. 从概念上讲,它与实体框架中的 DBContext 类似。It's similar, conceptually, to DBContext in Entity Framework.

通过将以下行添加到 outputFolder 字段下 Program.cs 的 Main 方法,使用新的 MLContext 实例初始化 mlContext 变量 。Initialize the mlContext variable with a new instance of MLContext by adding the following line to the Main method of Program.cs below the outputFolder field.

MLContext mlContext = new MLContext();

创建分析器来处理模型输出Create a parser to post-process model outputs

该模型将图像分割为 13 x 13 网格,其中每个网格单元格为 32px x 32pxThe model segments an image into a 13 x 13 grid, where each grid cell is 32px x 32px. 每个网格单元格包含 5 个潜在的对象边界框。Each grid cell contains 5 potential object bounding boxes. 边界框有 25 个元素:A bounding box has 25 elements:

左侧是网格示例,右侧是边界框示例

  • x:边界框中心相对于与其关联的网格单元格的 x 位置。x the x position of the bounding box center relative to the grid cell it's associated with.
  • y:边界框中心相对于与其关联的网格单元格格的 y 位置。y the y position of the bounding box center relative to the grid cell it's associated with.
  • w:边界框的宽度。w the width of the bounding box.
  • h:边界框的高度。h the height of the bounding box.
  • o:对象存在于边界框内的置信度值,也称为对象得分。o the confidence value that an object exists within the bounding box, also known as objectness score.
  • 模型预测的 20 个类中每个类的 p1-p20 类概率。p1-p20 class probabilities for each of the 20 classes predicted by the model.

总之,描述 5 个边界框中每个框的 25 个元素构成了每个网格单元格中包含的 125 个元素。In total, the 25 elements describing each of the 5 bounding boxes make up the 125 elements contained in each grid cell.

预训练的 ONNX 模型生成的输出是长度为 21125 的浮点数组,表示维度为 125 x 13 x 13 的张量元素。The output generated by the pre-trained ONNX model is a float array of length 21125, representing the elements of a tensor with dimensions 125 x 13 x 13. 为了将模型生成的预测转换为张量,需要进行一些后处理工作。In order to transform the predictions generated by the model into a tensor, some post-processing work is required. 为此,请创建一组类以帮助分析输出。To do so, create a set of classes to help parse the output.

向项目中添加一个新目录以组织分析器类集。Add a new directory to your project to organize the set of parser classes.

  1. 在“解决方案资源管理器”中,右键单击项目,然后选择“添加” > “新文件夹” 。In Solution Explorer, right-click the project, and then select Add > New Folder. 当新文件夹出现在解决方案资源管理器中时,将其命名为“YoloParser”。When the new folder appears in the Solution Explorer, name it "YoloParser".

创建边界框和维度Create bounding boxes and dimensions

模型输出的数据包含图像中对象边界框的坐标和维度。The data output by the model contains coordinates and dimensions of the bounding boxes of objects within the image. 创建维度的基类。Create a base class for dimensions.

  1. 在“解决方案资源管理器”中,右键单击“YoloParser”目录,然后选择“添加” > “新项” 。In Solution Explorer, right-click the YoloParser directory, and then select Add > New Item.

  2. 在“添加新项”对话框中,选择“类”并将“名称”字段更改为“DimensionsBase.cs” 。In the Add New Item dialog box, select Class and change the Name field to DimensionsBase.cs. 然后,选择“添加” 按钮。Then, select the Add button.

    此时,将在代码编辑器中打开 DimensionsBase.cs 文件 。The DimensionsBase.cs file opens in the code editor. 删除所有 using 语句和现有类定义。Remove all using statements and existing class definition.

    DimensionsBase 类的以下代码添加到 DimensionsBase.cs 文件中 :Add the following code for the DimensionsBase class to the DimensionsBase.cs file:

    public class DimensionsBase
    {
        public float X { get; set; }
        public float Y { get; set; }
        public float Height { get; set; }
        public float Width { get; set; }
    }
    

    DimensionsBase 具有以下 float 属性:DimensionsBase has the following float properties:

    • X:包含对象沿 x 轴的位置。X contains the position of the object along the x-axis.
    • Y:包含对象沿 y 轴的位置。Y contains the position of the object along the y-axis.
    • Height:包含对象的高度。Height contains the height of the object.
    • Width:包含对象的宽度。Width contains the width of the object.

接下来,为边界框创建一个类。Next, create a class for your bounding boxes.

  1. 在“解决方案资源管理器”中,右键单击“YoloParser”目录,然后选择“添加” > “新项” 。In Solution Explorer, right-click the YoloParser directory, and then select Add > New Item.

  2. 在“添加新项”对话框中,选择“类”并将“名称”字段更改为“YoloBoundingBox.cs” 。In the Add New Item dialog box, select Class and change the Name field to YoloBoundingBox.cs. 然后,选择“添加” 按钮。Then, select the Add button.

    此时,将在代码编辑器中打开 YoloBoundingBox.cs 文件 。The YoloBoundingBox.cs file opens in the code editor. 将下面的 using 语句添加到 YoloBoundingBox.cs 顶部 :Add the following using statement to the top of YoloBoundingBox.cs:

    using System.Drawing;
    

    在现有类定义的正上方,添加一个名为 BoundingBoxDimensions 的新类定义,该定义继承自 DimensionsBase 类以包含相应边界框的维度。Just above the existing class definition, add a new class definition called BoundingBoxDimensions that inherits from the DimensionsBase class to contain the dimensions of the respective bounding box.

    public class BoundingBoxDimensions : DimensionsBase { }
    

    删除现有 YoloBoundingBox 类定义,并将 YoloBoundingBox 类的以下代码添加到 YoloBoundingBox.cs 文件中 :Remove the existing YoloBoundingBox class definition and add the following code for the YoloBoundingBox class to the YoloBoundingBox.cs file:

    public class YoloBoundingBox
    {
        public BoundingBoxDimensions Dimensions { get; set; }
    
        public string Label { get; set; }
    
        public float Confidence { get; set; }
    
        public RectangleF Rect
        {
            get { return new RectangleF(Dimensions.X, Dimensions.Y, Dimensions.Width, Dimensions.Height); }
        }
    
        public Color BoxColor { get; set; }
    }
    

    YoloBoundingBox 具有以下属性:YoloBoundingBox has the following properties:

    • Dimensions:包含边界框的维度。Dimensions contains dimensions of the bounding box.
    • Label:包含在边界框内检测到的对象类。Label contains the class of object detected within the bounding box.
    • Confidence:包含类的置信度。Confidence contains the confidence of the class.
    • Rect:包含边界框维度的矩形表示形式。Rect contains the rectangle representation of the bounding box's dimensions.
    • BoxColor:包含与用于在图像上绘制的相应类关联的颜色。BoxColor contains the color associated with the respective class used to draw on the image.

创建分析器Create the parser

创建维度和边界框的类之后,接下来创建分析器。Now that the classes for dimensions and bounding boxes are created, it's time to create the parser.

  1. 在“解决方案资源管理器”中,右键单击“YoloParser”目录,然后选择“添加” > “新项” 。In Solution Explorer, right-click the YoloParser directory, and then select Add > New Item.

  2. 在“添加新项”对话框中,选择“类”并将“名称”字段更改为“YoloOutputParser.cs” 。In the Add New Item dialog box, select Class and change the Name field to YoloOutputParser.cs. 然后,选择“添加” 按钮。Then, select the Add button.

    此时,将在代码编辑器中打开 YoloOutputParser.cs 文件 。The YoloOutputParser.cs file opens in the code editor. 将下面的 using 语句添加到 YoloOutputParser.cs 顶部 :Add the following using statement to the top of YoloOutputParser.cs:

    using System;
    using System.Collections.Generic;
    using System.Drawing;
    using System.Linq;
    

    在现有 YoloOutputParser 类定义中,添加一个嵌套类,其中包含图像中每个单元格的尺寸。Inside the existing YoloOutputParser class definition, add a nested class that contains the dimensions of each of the cells in the image. CellDimensions 类添加以下代码,该类继承自 YoloOutputParser 类定义顶部的 DimensionsBase 类。Add the following code for the CellDimensions class that inherits from the DimensionsBase class at the top of the YoloOutputParser class definition.

    class CellDimensions : DimensionsBase { }
    
  3. YoloOutputParser 类定义中,添加以下常量和字段。Inside the YoloOutputParser class definition, add the following constants and fields.

    public const int ROW_COUNT = 13;
    public const int COL_COUNT = 13;
    public const int CHANNEL_COUNT = 125;
    public const int BOXES_PER_CELL = 5;
    public const int BOX_INFO_FEATURE_COUNT = 5;
    public const int CLASS_COUNT = 20;
    public const float CELL_WIDTH = 32;
    public const float CELL_HEIGHT = 32;
    
    private int channelStride = ROW_COUNT * COL_COUNT;
    
    • ROW_COUNT 是分割图像的网格中的行数。ROW_COUNT is the number of rows in the grid the image is divided into.
    • COL_COUNT 是分割图像的网格中的列数。COL_COUNT is the number of columns in the grid the image is divided into.
    • CHANNEL_COUNT 是其中一个网格单元格中包含的值的总数。CHANNEL_COUNT is the total number of values contained in one cell of the grid.
    • BOXES_PER_CELL 是单元格中边界框的数量,BOXES_PER_CELL is the number of bounding boxes in a cell,
    • BOX_INFO_FEATURE_COUNT 是框中包含的特征数(x、y、高度、宽度、置信度)。BOX_INFO_FEATURE_COUNT is the number of features contained within a box (x,y,height,width,confidence).
    • CLASS_COUNT 是每个边界框中包含的类预测数。CLASS_COUNT is the number of class predictions contained in each bounding box.
    • CELL_WIDTH 是图像网格中一个单元格的宽度。CELL_WIDTH is the width of one cell in the image grid.
    • CELL_HEIGHT 是图像网格中一个单元格的高度。CELL_HEIGHT is the height of one cell in the image grid.
    • channelStride 是网格中当前单元格的起始位置。channelStride is the starting position of the current cell in the grid.

    当模型进行预测(也称为评分)时,它会将 416px x 416px 输入图像划分为 13 x 13 大小的单元格网格。When the model makes a prediction, also known as scoring, it divides the 416px x 416px input image into a grid of cells the size of 13 x 13. 每个单元格都包含 32px x 32pxEach cell contains is 32px x 32px. 在每个单元格内,有 5 个边界框,每个边框包含 5 个特征(x、y、宽度、高度、置信度)。Within each cell, there are 5 bounding boxes each containing 5 features (x, y, width, height, confidence). 此外,每个边界框包含每个类的概率,在这种情况下为 20。In addition, each bounding box contains the probability of each of the classes, which in this case is 20. 因此,每个单元包含 125 条信息(5 个特征 + 20 个类概率)。Therefore, each cell contains 125 pieces of information (5 features + 20 class probabilities).

为所有 5 个边界框在 channelStride 下创建的定位点列表:Create a list of anchors below channelStride for all 5 bounding boxes:

private float[] anchors = new float[]
{
    1.08F, 1.19F, 3.42F, 4.41F, 6.63F, 11.38F, 9.42F, 5.11F, 16.62F, 10.52F
};

定位点是边界框的预定义的高度和宽度比例。Anchors are pre-defined height and width ratios of bounding boxes. 模型检测到的大多数对象或类都具有相似的比例。Most object or classes detected by a model have similar ratios. 这在创建边界框时非常有用。This is valuable when it comes to creating bounding boxes. 不是预测边界框,而是计算预定义维度的偏移量,因此减少了预测边界框所需的计算量。Instead of predicting the bounding boxes, the offset from the pre-defined dimensions is calculated therefore reducing the computation required to predict the bounding box. 通常,这些定位点比例是基于所使用的数据集计算的。Typically these anchor ratios are calculated based on the dataset used. 在这种情况下,由于数据集是已知的且值已预先计算,因此可以硬编码定位点。In this case, because the dataset is known and the values have been pre-computed, the anchors can be hard-coded.

接下来,定义模型将预测的标签或类。Next, define the labels or classes that the model will predict. 该模型预测了 20 个类,它们是原始 YOLOv2 模型预测的类总数的子集。This model predicts 20 classes, which is a subset of the total number of classes predicted by the original YOLOv2 model.

anchors 下面添加标签列表。Add your list of labels below the anchors.

private string[] labels = new string[]
{
    "aeroplane", "bicycle", "bird", "boat", "bottle",
    "bus", "car", "cat", "chair", "cow",
    "diningtable", "dog", "horse", "motorbike", "person",
    "pottedplant", "sheep", "sofa", "train", "tvmonitor"
};

每个类都有相关联的颜色。There are colors associated with each of the classes. labels 下分配类的颜色:Assign your class colors below your labels:

private static Color[] classColors = new Color[]
{
    Color.Khaki,
    Color.Fuchsia,
    Color.Silver,
    Color.RoyalBlue,
    Color.Green,
    Color.DarkOrange,
    Color.Purple,
    Color.Gold,
    Color.Red,
    Color.Aquamarine,
    Color.Lime,
    Color.AliceBlue,
    Color.Sienna,
    Color.Orchid,
    Color.Tan,
    Color.LightPink,
    Color.Yellow,
    Color.HotPink,
    Color.OliveDrab,
    Color.SandyBrown,
    Color.DarkTurquoise
};

创建帮助程序函数Create helper functions

后处理阶段涉及一系列步骤。There are a series of steps involved in the post-processing phase. 为此,可以使用几种帮助程序方法。To help with that, several helper methods can be employed.

分析器使用的帮助程序方法是:The helper methods used in by the parser are:

  • Sigmoid 应用 sigmoid 函数,该函数输出介于 0 和 1 之间的数字。Sigmoid applies the sigmoid function that outputs a number between 0 and 1.
  • Softmax 将输入向量规范化为概率分布。Softmax normalizes an input vector into a probability distribution.
  • GetOffset 将一维模型输出中的元素映射到 125 x 13 x 13 张量中的对应位置。GetOffset maps elements in the one-dimensional model output to the corresponding position in a 125 x 13 x 13 tensor.
  • ExtractBoundingBoxes 使用模型输出中的 GetOffset 方法提取边界框维度。ExtractBoundingBoxes extracts the bounding box dimensions using the GetOffset method from the model output.
  • GetConfidence 提取置信度值,该值表示模型是否检测到对象,并使用 Sigmoid 函数将其转换为百分比。GetConfidence extracts the confidence value that states how sure the model is that it has detected an object and uses the Sigmoid function to turn it into a percentage.
  • MapBoundingBoxToCell 使用边界框维度并将它们映射到图像中的相应单元格。MapBoundingBoxToCell uses the bounding box dimensions and maps them onto its respective cell within the image.
  • ExtractClasses 使用 GetOffset 方法从模型输出中提取边界框的类预测,并使用 Softmax 方法将它们转换为概率分布。ExtractClasses extracts the class predictions for the bounding box from the model output using the GetOffset method and turns them into a probability distribution using the Softmax method.
  • GetTopResult 从具有最高概率的预测类列表中选择类。GetTopResult selects the class from the list of predicted classes with the highest probability.
  • IntersectionOverUnion 筛选具有较低概率的重叠边界框。IntersectionOverUnion filters overlapping bounding boxes with lower probabilities.

classColors 列表下面添加所有帮助程序方法的代码。Add the code for all the helper methods below your list of classColors.

private float Sigmoid(float value)
{
    var k = (float)Math.Exp(value);
    return k / (1.0f + k);
}

private float[] Softmax(float[] values)
{
    var maxVal = values.Max();
    var exp = values.Select(v => Math.Exp(v - maxVal));
    var sumExp = exp.Sum();

    return exp.Select(v => (float)(v / sumExp)).ToArray();
}

private int GetOffset(int x, int y, int channel)
{
    // YOLO outputs a tensor that has a shape of 125x13x13, which 
    // WinML flattens into a 1D array.  To access a specific channel 
    // for a given (x,y) cell position, we need to calculate an offset
    // into the array
    return (channel * this.channelStride) + (y * COL_COUNT) + x;
}

private BoundingBoxDimensions ExtractBoundingBoxDimensions(float[] modelOutput, int x, int y, int channel)
{
    return new BoundingBoxDimensions
    {
        X = modelOutput[GetOffset(x, y, channel)],
        Y = modelOutput[GetOffset(x, y, channel + 1)],
        Width = modelOutput[GetOffset(x, y, channel + 2)],
        Height = modelOutput[GetOffset(x, y, channel + 3)]
    };
}

private float GetConfidence(float[] modelOutput, int x, int y, int channel)
{
    return Sigmoid(modelOutput[GetOffset(x, y, channel + 4)]);
}

private CellDimensions MapBoundingBoxToCell(int x, int y, int box, BoundingBoxDimensions boxDimensions)
{
    return new CellDimensions
    {
        X = ((float)x + Sigmoid(boxDimensions.X)) * CELL_WIDTH,
        Y = ((float)y + Sigmoid(boxDimensions.Y)) * CELL_HEIGHT,
        Width = (float)Math.Exp(boxDimensions.Width) * CELL_WIDTH * anchors[box * 2],
        Height = (float)Math.Exp(boxDimensions.Height) * CELL_HEIGHT * anchors[box * 2 + 1],
    };
}

public float[] ExtractClasses(float[] modelOutput, int x, int y, int channel)
{
    float[] predictedClasses = new float[CLASS_COUNT];
    int predictedClassOffset = channel + BOX_INFO_FEATURE_COUNT;
    for (int predictedClass = 0; predictedClass < CLASS_COUNT; predictedClass++)
    {
        predictedClasses[predictedClass] = modelOutput[GetOffset(x, y, predictedClass + predictedClassOffset)];
    }
    return Softmax(predictedClasses);
}

private ValueTuple<int, float> GetTopResult(float[] predictedClasses)
{
    return predictedClasses
        .Select((predictedClass, index) => (Index: index, Value: predictedClass))
        .OrderByDescending(result => result.Value)
        .First();
}

private float IntersectionOverUnion(RectangleF boundingBoxA, RectangleF boundingBoxB)
{
    var areaA = boundingBoxA.Width * boundingBoxA.Height;

    if (areaA <= 0)
        return 0;

    var areaB = boundingBoxB.Width * boundingBoxB.Height;

    if (areaB <= 0)
        return 0;

    var minX = Math.Max(boundingBoxA.Left, boundingBoxB.Left);
    var minY = Math.Max(boundingBoxA.Top, boundingBoxB.Top);
    var maxX = Math.Min(boundingBoxA.Right, boundingBoxB.Right);
    var maxY = Math.Min(boundingBoxA.Bottom, boundingBoxB.Bottom);

    var intersectionArea = Math.Max(maxY - minY, 0) * Math.Max(maxX - minX, 0);

    return intersectionArea / (areaA + areaB - intersectionArea);
}

定义完所有帮助程序方法之后,即可使用它们来处理模型输出。Once you have defined all of the helper methods, it's time to use them to process the model output.

IntersectionOverUnion 方法下面,创建 ParseOutputs 方法以处理模型生成的输出。Below the IntersectionOverUnion method, create the ParseOutputs method to process the output generated by the model.

public IList<YoloBoundingBox> ParseOutputs(float[] yoloModelOutputs, float threshold = .3F)
{

}

创建一个列表来存储边界框并在 ParseOutputs 方法中定义变量。Create a list to store your bounding boxes and define variables inside the ParseOutputs method.

var boxes = new List<YoloBoundingBox>();

每个图像都被分成一个 13 x 13 单元格的网格。Each image is divided into a grid of 13 x 13 cells. 每个单元格包含五个边界框。Each cell contains five bounding boxes. boxes 变量下方,添加代码以处理每个单元格中的所有框。Below the boxes variable, add code to process all of the boxes in each of the cells.

for (int row = 0; row < ROW_COUNT; row++)
{
    for (int column = 0; column < COL_COUNT; column++)
    {
        for (int box = 0; box < BOXES_PER_CELL; box++)
        {

        }
    }
}

在最内层循环内,计算一维模型输出中当前框的起始位置。Inside the inner-most loop, calculate the starting position of the current box within the one-dimensional model output.

var channel = (box * (CLASS_COUNT + BOX_INFO_FEATURE_COUNT));

在其下方,使用 ExtractBoundingBoxDimensions 方法获取当前边界框的维度。Directly below that, use the ExtractBoundingBoxDimensions method to get the dimensions of the current bounding box.

BoundingBoxDimensions boundingBoxDimensions = ExtractBoundingBoxDimensions(yoloModelOutputs, row, column, channel);

然后,使用 GetConfidence 方法获取当前边界框的置信度。Then, use the GetConfidence method to get the confidence for the current bounding box.

float confidence = GetConfidence(yoloModelOutputs, row, column, channel);

之后,使用 MapBoundingBoxToCell 方法将当前边界框映射到正在处理的当前单元格。After that, use the MapBoundingBoxToCell method to map the current bounding box to the current cell being processed.

CellDimensions mappedBoundingBox = MapBoundingBoxToCell(row, column, box, boundingBoxDimensions);

在进行任何进一步的处理之前,请检查置信值是否大于提供的阈值。Before doing any further processing, check whether your confidence value is greater than the threshold provided. 如果没有,请处理下一个边界框。If not, process the next bounding box.

if (confidence < threshold)
    continue;

否则,继续处理输出。Otherwise, continue processing the output. 下一步是使用 ExtractClasses 方法获取当前边界框的预测类的概率分布。The next step is to get the probability distribution of the predicted classes for the current bounding box using the ExtractClasses method.

float[] predictedClasses = ExtractClasses(yoloModelOutputs, row, column, channel);

然后,使用 GetTopResult 方法获取当前框概率最高的类的值和索引,并计算其得分。Then, use the GetTopResult method to get the value and index of the class with the highest probability for the current box and compute its score.

var (topResultIndex, topResultScore) = GetTopResult(predictedClasses);
var topScore = topResultScore * confidence;

再次使用 topScore 只保留那些超过指定阈值的边界框。Use the topScore to once again keep only those bounding boxes that are above the specified threshold.

if (topScore < threshold)
    continue;

最后,如果当前边界框超过阈值,请创建一个新的 BoundingBox 对象并将其添加到 boxes 列表中。Finally, if the current bounding box exceeds the threshold, create a new BoundingBox object and add it to the boxes list.

boxes.Add(new YoloBoundingBox()
{
    Dimensions = new BoundingBoxDimensions
    {
        X = (mappedBoundingBox.X - mappedBoundingBox.Width / 2),
        Y = (mappedBoundingBox.Y - mappedBoundingBox.Height / 2),
        Width = mappedBoundingBox.Width,
        Height = mappedBoundingBox.Height,
    },
    Confidence = topScore,
    Label = labels[topResultIndex],
    BoxColor = classColors[topResultIndex]
});

处理完图像中的所有单元格后,返回 boxes 列表。Once all cells in the image have been processed, return the boxes list. ParseOutputs 方法的最外层 for 循环下面添加以下 return 语句。Add the following return statement below the outer-most for-loop in the ParseOutputs method.

return boxes;

筛选重叠框Filter overlapping boxes

从模型输出中提取所有高置信度的边界框之后,接下来需要进行额外的筛选以移除重叠的图像。Now that all of the highly confident bounding boxes have been extracted from the model output, additional filtering needs to be done to remove overlapping images. ParseOutputs 方法下添加一个名为 FilterBoundingBoxes 的方法:Add a method called FilterBoundingBoxes below the ParseOutputs method:

public IList<YoloBoundingBox> FilterBoundingBoxes(IList<YoloBoundingBox> boxes, int limit, float threshold)
{

}

FilterBoundingBoxes 方法中,首先创建一个等于检测到的框大小的数组,并将所有插槽标记为“活动”或“待处理”。Inside the FilterBoundingBoxes method, start off by creating an array equal to the size of detected boxes and marking all slots as active or ready for processing.

var activeCount = boxes.Count;
var isActiveBoxes = new bool[boxes.Count];

for (int i = 0; i < isActiveBoxes.Length; i++)
    isActiveBoxes[i] = true;

然后,根据置信度按降序对包含边界框的列表进行排序。Then, sort the list containing your bounding boxes in descending order based on confidence.

var sortedBoxes = boxes.Select((b, i) => new { Box = b, Index = i })
                    .OrderByDescending(b => b.Box.Confidence)
                    .ToList();

之后,创建一个列表来保存筛选的结果。After that, create a list to hold the filtered results.

var results = new List<YoloBoundingBox>();

通过遍历每个边界框,开始处理每个边界框。Begin processing each bounding box by iterating over each of the bounding boxes.

for (int i = 0; i < boxes.Count; i++)
{

}

在此 for 循环内部,检查是否可以处理当前边界框。Inside of this for-loop, check whether the current bounding box can be processed.

if (isActiveBoxes[i])
{

}

如果可以,请将边界框添加到结果列表中。If so, add the bounding box to the list of results. 如果结果超出了要提取的框的指定限制,则中断循环。If the results exceed the specified limit of boxes to be extracted, break out of the loop. 在 if 语句中添加以下代码。Add the following code inside the if-statement.

var boxA = sortedBoxes[i].Box;
results.Add(boxA);

if (results.Count >= limit)
    break;

否则,请查看相邻的边界框。Otherwise, look at the adjacent bounding boxes. 在框限制检查下面添加以下代码。Add the following code below the box limit check.

for (var j = i + 1; j < boxes.Count; j++)
{

}

与第一个框一样,如果相邻框处于活动或待处理状态,请使用 IntersectionOverUnion 方法检查第一个框和第二个框是否超出指定的阈值。Like the first box, if the adjacent box is active or ready to be processed, use the IntersectionOverUnion method to check whether the first box and the second box exceed the specified threshold. 将以下代码添加到最内层的 for 循环中。Add the following code to your innermost for-loop.

if (isActiveBoxes[j])
{
    var boxB = sortedBoxes[j].Box;

    if (IntersectionOverUnion(boxA.Rect, boxB.Rect) > threshold)
    {
        isActiveBoxes[j] = false;
        activeCount--;

        if (activeCount <= 0)
            break;
    }
}

在检查相邻边界框的最内部 for 循环之外,查看是否有任何剩余的边界框要处理。Outside of the inner-most for-loop that checks adjacent bounding boxes, see whether there are any remaining bounding boxes to be processed. 如果没有,则中断外部 for 循环。If not, break out of the outer for-loop.

if (activeCount <= 0)
    break;

最后,在 FilterBoundingBoxes 方法的初始 for 循环之外,返回结果:Finally, outside of the initial for-loop of the FilterBoundingBoxes method, return the results:

return results;

很棒!Great! 现在可以将此代码与模型配合使用,以进行评分。Now it's time to use this code along with the model for scoring.

使用该模型进行评分Use the model for scoring

就像后处理一样,评分步骤也有几个步骤。Just like with post-processing, there are a few steps in the scoring steps. 为此,请向项目添加包含评分逻辑的类。To help with this, add a class that will contain the scoring logic to your project.

  1. 在“解决方案资源管理器” 中,右键单击项目,然后选择“添加” > “新项” 。In Solution Explorer, right-click the project, and then select Add > New Item.

  2. 在“添加新项”对话框中,选择“类”并将“名称”字段更改为“OnnxModelScorer.cs” 。In the Add New Item dialog box, select Class and change the Name field to OnnxModelScorer.cs. 然后,选择“添加” 按钮。Then, select the Add button.

    此时,将在代码编辑器中打开 OnnxModelScorer.cs 文件 。The OnnxModelScorer.cs file opens in the code editor. 将下面的 using 语句添加到 OnnxModelScorer.cs 顶部 :Add the following using statement to the top of OnnxModelScorer.cs:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using Microsoft.ML;
    using Microsoft.ML.Data;
    using ObjectDetection.DataStructures;
    using ObjectDetection.YoloParser;
    

    OnnxModelScorer 类定义中,添加以下变量。Inside the OnnxModelScorer class definition, add the following variables.

    private readonly string imagesFolder;
    private readonly string modelLocation;
    private readonly MLContext mlContext;
    
    private IList<YoloBoundingBox> _boundingBoxes = new List<YoloBoundingBox>();
    

    在其下方,为 OnnxModelScorer 类创建一个构造函数,用于初始化先前定义的变量。Directly below that, create a constructor for the OnnxModelScorer class that will initialize the previously defined variables.

    public OnnxModelScorer(string imagesFolder, string modelLocation, MLContext mlContext)
    {
        this.imagesFolder = imagesFolder;
        this.modelLocation = modelLocation;
        this.mlContext = mlContext;
    }
    

    创建构造函数之后,定义两个结构,其中包含与图像和模型设置相关的变量。Once you have created the constructor, define a couple of structs that contain variables related to the image and model settings. 创建一个名为 ImageNetSettings 的结构,以包含预期作为模型输入的高度和宽度。Create a struct called ImageNetSettings to contain the height and width expected as input for the model.

    public struct ImageNetSettings
    {
        public const int imageHeight = 416;
        public const int imageWidth = 416;
    }
    

    然后,创建另一个名为 TinyYoloModelSettings 的结构,其中包含模型的输入层和输出层名称。After that, create another struct called TinyYoloModelSettings that contains the names of the input and output layers of the model. 若要可视化模型的输入层和输出层名称,可以使用 Netron 之类的工具。To visualize the name of the input and output layers of the model, you can use a tool like Netron.

    public struct TinyYoloModelSettings
    {
        // for checking Tiny yolo2 Model input and  output  parameter names,
        //you can use tools like Netron, 
        // which is installed by Visual Studio AI Tools
    
        // input tensor name
        public const string ModelInput = "image";
    
        // output tensor name
        public const string ModelOutput = "grid";
    }
    

    接下来,创建用于评分的第一组方法。Next, create the first set of methods use for scoring. OnnxModelScorer 类中创建 LoadModel 方法。Create the LoadModel method inside of your OnnxModelScorer class.

    private ITransformer LoadModel(string modelLocation)
    {
    
    }
    

    LoadModel 方法中,添加以下代码以进行日志记录。Inside the LoadModel method, add the following code for logging.

    Console.WriteLine("Read model");
    Console.WriteLine($"Model location: {modelLocation}");
    Console.WriteLine($"Default parameters: image size=({ImageNetSettings.imageWidth},{ImageNetSettings.imageHeight})");
    

    ML.NET 管道需要知道在调用 Fit 方法时要操作的数据架构。ML.NET pipelines need to know the data schema to operate on when the Fit method is called. 在这种情况下,将使用类似于训练的过程。In this case, a process similar to training will be used. 但是,由于没有进行实际训练,因此可以使用空的 IDataViewHowever, because no actual training is happening, it is acceptable to use an empty IDataView. 使用空列表为管道创建新的 IDataViewCreate a new IDataView for the pipeline from an empty list.

    var data = mlContext.Data.LoadFromEnumerable(new List<ImageNetData>());
    

    在此之后,定义管道。Below that, define the pipeline. 管道将包含四个转换。The pipeline will consist of four transforms.

    • LoadImages 将图片作为位图加载。LoadImages loads the image as a Bitmap.
    • ResizeImages 将图片重新调整为指定的大小(在本例中为 416 x 416)。ResizeImages rescales the image to the size specified (in this case, 416 x 416).
    • ExtractPixels 将图像的像素表示形式从位图更改为数字向量。ExtractPixels changes the pixel representation of the image from a Bitmap to a numerical vector.
    • ApplyOnnxModel 加载 ONNX 模型并使用它对提供的数据进行评分。ApplyOnnxModel loads the ONNX model and uses it to score on the data provided.

    data 变量下面的 LoadModel 方法中定义管道。Define your pipeline in the LoadModel method below the data variable.

    var pipeline = mlContext.Transforms.LoadImages(outputColumnName: "image", imageFolder: "", inputColumnName: nameof(ImageNetData.ImagePath))
                    .Append(mlContext.Transforms.ResizeImages(outputColumnName: "image", imageWidth: ImageNetSettings.imageWidth, imageHeight: ImageNetSettings.imageHeight, inputColumnName: "image"))
                    .Append(mlContext.Transforms.ExtractPixels(outputColumnName: "image"))
                    .Append(mlContext.Transforms.ApplyOnnxModel(modelFile: modelLocation, outputColumnNames: new[] { TinyYoloModelSettings.ModelOutput }, inputColumnNames: new[] { TinyYoloModelSettings.ModelInput }));
    

    现在,可以实例化模型以进行评分。Now it's time to instantiate the model for scoring. 调用管道上的 Fit 方法并将其返回以进行进一步处理。Call the Fit method on the pipeline and return it for further processing.

    var model = pipeline.Fit(data);
    
    return model;
    

加载模型后,可将其用来进行预测。Once the model is loaded, it can then be used to make predictions. 若要简化该过程,请在 LoadModel 方法下创建一个名为 PredictDataUsingModel 的方法。To facilitate that process, create a method called PredictDataUsingModel below the LoadModel method.

private IEnumerable<float[]> PredictDataUsingModel(IDataView testData, ITransformer model)
{

}

PredictDataUsingModel 中,添加以下代码以进行日志记录。Inside the PredictDataUsingModel, add the following code for logging.

Console.WriteLine($"Images location: {imagesFolder}");
Console.WriteLine("");
Console.WriteLine("=====Identify the objects in the images=====");
Console.WriteLine("");

然后,使用 Transform 方法对数据进行评分。Then, use the Transform method to score the data.

IDataView scoredData = model.Transform(testData);

提取预测的概率并将其返回以进行其他处理。Extract the predicted probabilities and return them for additional processing.

IEnumerable<float[]> probabilities = scoredData.GetColumn<float[]>(TinyYoloModelSettings.ModelOutput);

return probabilities;

现在已经设置了两个步骤,将它们合并为一个方法。Now that both steps are set up, combine them into a single method. PredictDataUsingModel 方法下面,添加一个名为 Score 的新方法。Below the PredictDataUsingModel method, add a new method called Score.

public IEnumerable<float[]> Score(IDataView data)
{
    var model = LoadModel(modelLocation);

    return PredictDataUsingModel(data, model);
}

马上就大功告成了!Almost there! 现在可以将其全部投入使用。Now it's time to put it all to use.

检测对象Detect objects

现在所有设置都已完成,可以检测一些对象。Now that all of the setup is complete, it's time to detect some objects. 首先在 Program.cs 类中添加对评分器和分析器的引用 。Start off by adding references to the scorer and parser in your Program.cs class.

using ObjectDetection.YoloParser;
using ObjectDetection.DataStructures;

对模型输出进行评分和分析Score and parse model outputs

在 Program.cs 类的 Main 方法内,添加 try-catch 语句 。Inside the Main method of your Program.cs class, add a try-catch statement.

try
{

}
catch (Exception ex)
{
    Console.WriteLine(ex.ToString());
}

try 块内部,开始实现对象检测逻辑。Inside of the try block, start implementing the object detection logic. 首先,将数据加载到 IDataView 中。First, load the data into an IDataView.

IEnumerable<ImageNetData> images = ImageNetData.ReadFromFile(imagesFolder);
IDataView imageDataView = mlContext.Data.LoadFromEnumerable(images);

然后,创建 OnnxModelScorer 的实例,并使用它对加载的数据进行评分。Then, create an instance of OnnxModelScorer and use it to score the loaded data.

var modelScorer = new OnnxModelScorer(imagesFolder, modelFilePath, mlContext);

// Use model to score data
IEnumerable<float[]> probabilities = modelScorer.Score(imageDataView);

现在执行后处理步骤。Now it's time for the post-processing step. 创建 YoloOutputParser 的实例并使用它来处理模型输出。Create an instance of YoloOutputParser and use it to process the model output.

YoloOutputParser parser = new YoloOutputParser();

var boundingBoxes =
    probabilities
    .Select(probability => parser.ParseOutputs(probability))
    .Select(boxes => parser.FilterBoundingBoxes(boxes, 5, .5F));

处理完模型输出后,便可以在图像上绘制边界框。Once the model output has been processed, it's time to draw the bounding boxes on the images.

将预测结果可视化Visualize predictions

在模型对图像进行评分并处理好输出后,必须在图像上绘制边界框。After the model has scored the images and the outputs have been processed, the bounding boxes have to be drawn on the image. 为此,请在 Program.cs 内的 GetAbsolutePath 方法下添加名为 DrawBoundingBox 的方法 。To do so, add a method called DrawBoundingBox below the GetAbsolutePath method inside of Program.cs.

private static void DrawBoundingBox(string inputImageLocation, string outputImageLocation, string imageName, IList<YoloBoundingBox> filteredBoundingBoxes)
{

}

首先,加载图像并使用 DrawBoundingBox 方法获取高度和宽度尺寸。First, load the image and get the height and width dimensions in the DrawBoundingBox method.

Image image = Image.FromFile(Path.Combine(inputImageLocation, imageName));

var originalImageHeight = image.Height;
var originalImageWidth = image.Width;

然后,创建 for-each 循环以遍历模型检测到的每个边界框。Then, create a for-each loop to iterate over each of the bounding boxes detected by the model.

foreach (var box in filteredBoundingBoxes)
{

}

在 for each 循环的内部,获取边界框的维度。Inside of the for-each loop, get the dimensions of the bounding box.

var x = (uint)Math.Max(box.Dimensions.X, 0);
var y = (uint)Math.Max(box.Dimensions.Y, 0);
var width = (uint)Math.Min(originalImageWidth - x, box.Dimensions.Width);
var height = (uint)Math.Min(originalImageHeight - y, box.Dimensions.Height);

由于边界框的维度对应于 416 x 416 的模型输入,因此缩放边界框维度以匹配图像的实际尺寸。Because the dimensions of the bounding box correspond to the model input of 416 x 416, scale the bounding box dimensions to match the actual size of the image.

x = (uint)originalImageWidth * x / OnnxModelScorer.ImageNetSettings.imageWidth;
y = (uint)originalImageHeight * y / OnnxModelScorer.ImageNetSettings.imageHeight;
width = (uint)originalImageWidth * width / OnnxModelScorer.ImageNetSettings.imageWidth;
height = (uint)originalImageHeight * height / OnnxModelScorer.ImageNetSettings.imageHeight;

然后,为将出现在每个边界框上方的文本定义模板。Then, define a template for text that will appear above each bounding box. 文本将包含相应边界框内的对象类以及置信度。The text will contain the class of the object inside of the respective bounding box as well as the confidence.

string text = $"{box.Label} ({(box.Confidence * 100).ToString("0")}%)";

若要在图像上绘制,请将其转换为 Graphics 对象。In order to draw on the image, convert it to a Graphics object.

using (Graphics thumbnailGraphic = Graphics.FromImage(image))
{

}

using 代码块内,调整图形的 Graphics 对象设置。Inside the using code block, tune the graphic's Graphics object settings.

thumbnailGraphic.CompositingQuality = CompositingQuality.HighQuality;
thumbnailGraphic.SmoothingMode = SmoothingMode.HighQuality;
thumbnailGraphic.InterpolationMode = InterpolationMode.HighQualityBicubic;

在下面,设置文本和边界框的字体和颜色选项。Below that, set the font and color options for the text and bounding box.

// Define Text Options
Font drawFont = new Font("Arial", 12, FontStyle.Bold);
SizeF size = thumbnailGraphic.MeasureString(text, drawFont);
SolidBrush fontBrush = new SolidBrush(Color.Black);
Point atPoint = new Point((int)x, (int)y - (int)size.Height - 1);

// Define BoundingBox options
Pen pen = new Pen(box.BoxColor, 3.2f);
SolidBrush colorBrush = new SolidBrush(box.BoxColor);

使用 FillRectangle 方法创建并填充边界框上方的矩形以包含文本。Create and fill a rectangle above the bounding box to contain the text using the FillRectangle method. 这将有助于对比文本,提高可读性。This will help contrast the text and improve readability.

thumbnailGraphic.FillRectangle(colorBrush, (int)x, (int)(y - size.Height - 1), (int)size.Width, (int)size.Height);

然后,使用 DrawStringDrawRectangle 方法在图像上绘制文本和边界框。Then, Draw the text and bounding box on the image using the DrawString and DrawRectangle methods.

thumbnailGraphic.DrawString(text, drawFont, fontBrush, atPoint);

// Draw bounding box on image
thumbnailGraphic.DrawRectangle(pen, x, y, width, height);

在 for-each 循环之外,添加代码以保存 outputDirectory 中的图像。Outside of the for-each loop, add code to save the images in the outputDirectory.

if (!Directory.Exists(outputImageLocation))
{
    Directory.CreateDirectory(outputImageLocation);
}

image.Save(Path.Combine(outputImageLocation, imageName));

要获得应用程序在运行时按预期进行预测的其他反馈,请在 Program.cs 文件中的 DrawBoundingBox 方法下添加名为 LogDetectedObjects 的方法,以将检测到的对象输出到控制台 。For additional feedback that the application is making predictions as expected at runtime, add a method called LogDetectedObjects below the DrawBoundingBox method in the Program.cs file to output the detected objects to the console.

private static void LogDetectedObjects(string imageName, IList<YoloBoundingBox> boundingBoxes)
{
    Console.WriteLine($".....The objects in the image {imageName} are detected as below....");

    foreach (var box in boundingBoxes)
    {
        Console.WriteLine($"{box.Label} and its Confidence score: {box.Confidence}");
    }

    Console.WriteLine("");
}

你已经有了帮助器方法,现在可以根据预测创建视觉反馈,并可添加 for 循环来循环访问每个已评分的图像。Now that you have helper methods to create visual feedback from the predictions, add a for-loop to iterate over each of the scored images.

for (var i = 0; i < images.Count(); i++)
{

}

在 for 循环内部,获取图像文件的名称以及与之关联的边界框。Inside of the for-loop, get the name of the image file and the bounding boxes associated with it.

string imageFileName = images.ElementAt(i).Label;
IList<YoloBoundingBox> detectedObjects = boundingBoxes.ElementAt(i);

在此之后,使用 DrawBoundingBox 方法在图像上绘制边界框。Below that, use the DrawBoundingBox method to draw the bounding boxes on the image.

DrawBoundingBox(imagesFolder, outputFolder, imageFileName, detectedObjects);

最后,使用 LogDetectedObjects 方法将预测输出到控制台。Lastly, use the LogDetectedObjects method to output predictions to the console.

LogDetectedObjects(imageFileName, detectedObjects);

在 try-catch 语句之后,添加其他逻辑以指示进程已完成运行。After the try-catch statement, add additional logic to indicate the process is done running.

Console.WriteLine("========= End of Process..Hit any Key ========");
Console.ReadLine();

就这么简单!That's it!

结果Results

按照上述步骤操作后,运行控制台应用 (Ctrl + F5)。After following the previous steps, run your console app (Ctrl + F5). 结果应如以下输出所示。Your results should be similar to the following output. 你可能会看到警告或处理消息,为清楚起见,这些消息已从以下结果中删除。You may see warnings or processing messages, but these messages have been removed from the following results for clarity.

=====Identify the objects in the images=====

.....The objects in the image image1.jpg are detected as below....
car and its Confidence score: 0.9697262
car and its Confidence score: 0.6674225
person and its Confidence score: 0.5226039
car and its Confidence score: 0.5224892
car and its Confidence score: 0.4675332

.....The objects in the image image2.jpg are detected as below....
cat and its Confidence score: 0.6461141
cat and its Confidence score: 0.6400049

.....The objects in the image image3.jpg are detected as below....
chair and its Confidence score: 0.840578
chair and its Confidence score: 0.796363
diningtable and its Confidence score: 0.6056048
diningtable and its Confidence score: 0.3737402

.....The objects in the image image4.jpg are detected as below....
dog and its Confidence score: 0.7608147
person and its Confidence score: 0.6321323
dog and its Confidence score: 0.5967442
person and its Confidence score: 0.5730394
person and its Confidence score: 0.5551759

========= End of Process..Hit any Key ========

若要查看带有边界框的图像,请导航到 assets/images/output/ 目录。To see the images with bounding boxes, navigate to the assets/images/output/ directory. 以下是其中一个已处理的图像示例。Below is a sample from one of the processed images.

已处理的餐厅图像示例

祝贺你!Congratulations! 现已通过重用 ML.NET 中的预训练 ONNX 模型,成功生成了对象检测机器学习模型。You've now successfully built a machine learning model for object detection by reusing a pre-trained ONNX model in ML.NET.

可以在 dotnet/machinelearning-samples 存储库中找到本教程的源代码。You can find the source code for this tutorial at the dotnet/machinelearning-samples repository.

在本教程中,你将了解:In this tutorial, you learned how to:

  • 了解问题Understand the problem
  • 了解什么是 ONNX 以及它如何与 ML.NET 配合使用Learn what ONNX is and how it works with ML.NET
  • 了解模型Understand the model
  • 重用预训练的模型Reuse the pre-trained model
  • 使用已加载模型检测对象Detect objects with a loaded model

请查看机器学习示例 GitHub 存储库,以探索扩展的对象检测示例。Check out the Machine Learning samples GitHub repository to explore an expanded object detection sample.