教程:通过 ML.NET 图像分类 API 使用迁移学习自动进行肉眼检查Tutorial: Automated visual inspection using transfer learning with the ML.NET Image Classification API

了解如何使用迁移学习、预先训练的 TensorFlow 模型和 ML.NET 图像分类 API 将混凝土表面的图像分类为有裂缝或无裂缝,以训练自定义深度学习模型。Learn how to train a custom deep learning model using transfer learning, a pretrained TensorFlow model and the ML.NET Image Classification API to classify images of concrete surfaces as cracked or uncracked.

在本教程中,你将了解:In this tutorial, you learn how to:

  • 了解问题Understand the problem
  • 了解 ML.NET 图像分类 APILearn about ML.NET Image Classification API
  • 了解预先训练的模型Understand the pretrained model
  • 使用迁移学习训练自定义的 TensorFlow 图像分类模型Use transfer learning to train a custom TensorFlow image classification model
  • 使用自定义模型对图像进行分类Classify images with the custom model

先决条件Prerequisites

图像分类迁移学习示例概述Image classification transfer learning sample overview

此示例是一个 C# .NET Core 控制台应用程序,它使用预先训练的深度学习 TensorFlow 模型对图像进行分类。This sample is a C# .NET Core console application that classifies images using a pretrained deep learning TensorFlow model. 此示例的代码可以在 GitHub 上的 dotnet/machinelearning-samples 存储库找到。The code for this sample can be found on the dotnet/machinelearning-samples repository on GitHub.

了解问题Understand the problem

图像分类是一种计算机视觉问题。Image classification is a computer vision problem. 图像分类使用图像作为输入,并将其分类为规定的类。Image classification takes an image as input and categorizes it into a prescribed class. 图像分类很有用的一些情况包括:Some scenarios where image classification is useful include:

  • 面部识别Facial recognition
  • 情感检测Emotion detection
  • 医疗诊断Medical diagnosis
  • 路标检测Landmark detection

本教程训练自定义图像分类模型,以便对桥面自动执行肉眼检查,以识别由裂缝损坏的结构。This tutorial trains a custom image classification model to perform automated visual inspection of bridge decks to identify structures that are damaged by cracks.

ML.NET 图像分类 APIML.NET Image Classification API

ML.NET 提供了各种执行图像分类的方式。ML.NET provides various ways of performing image classification. 本教程使用图像分类 API 应用迁移学习。This tutorial applies transfer learning using the Image Classification API. 图像分类 API 使用 TensorFlow.NET,即一个为 TensorFlow C++ API 提供 C# 绑定的低级别库。The Image Classification API makes use of TensorFlow.NET, a low-level library that provides C# bindings for the TensorFlow C++ API.

什么是迁移学习?What is transfer learning?

迁移学习应用从解决一个问题到解决其他相关问题所获得的知识。Transfer learning applies knowledge gained from solving one problem to another related problem.

从头开始训练深度学习模型需要设置多个参数、大量已标记的训练数据和海量计算资源(数百个 GPU 小时)。Training a deep learning model from scratch requires setting several parameters, a large amount of labeled training data, and a vast amount of compute resources (hundreds of GPU hours). 结合使用预先训练的模型与迁移学习可让你快速完成训练过程。Using a pretrained model along with transfer learning allows you to shortcut the training process.

训练过程Training process

图像分类 API 通过加载预先训练的 TensorFlow 模型启动训练过程。The Image Classification API starts the training process by loading a pretrained TensorFlow model. 该训练过程由两个以下步骤组成:The training process consists of two steps:

  1. 瓶颈阶段Bottleneck phase
  2. 训练阶段Training phase

训练步骤

瓶颈阶段Bottleneck phase

在瓶颈阶段,会加载训练图像集,并将像素值用作预先训练模型冻结层的输入或功能。During the bottleneck phase, the set of training images is loaded and the pixel values are used as input, or features, for the frozen layers of the pretrained model. 冻结层包含神经网络的所有层,最高可达倒数第二层,通常也称为瓶颈层。The frozen layers include all of the layers in the neural network up to the penultimate layer, informally known as the bottleneck layer. 这些层被称为冻结层,因为这些层中不会出现任何训练并且操作是直通的。These layers are referred to as frozen because no training will occur on these layers and operations are pass-through. 帮助模型区分不同类的低级别模式就是在这些冻结层上进行计算的。It's at these frozen layers where the lower-level patterns that help a model differentiate between the different classes are computed. 层数越大,此步骤的计算就越密集。The larger the number of layers, the more computationally intensive this step is. 幸运的是,由于这是一种一次性计算,因此在使用不同参数进行试验时,可以缓存结果并将其用于后续运行。Fortunately, since this is a one-time calculation, the results can be cached and used in later runs when experimenting with different parameters.

训练阶段Training phase

计算瓶颈阶段的输出值后,这些值将用作输入以重新训练模型的最后一层。Once the output values from the bottleneck phase are computed, they are used as input to retrain the final layer of the model. 此过程是一种迭代过程,并且会运行模型参数指定的次数。This process is iterative and runs for the number of times specified by model parameters. 在每次运行过程中,都将评估损失和准确度。During each run, the loss and accuracy are evaluated. 然后,对模型进行适当调整,以最大程度地降低损失并最大程度地提高准确度。Then, the appropriate adjustments are made to improve the model with the goal of minimizing the loss and maximizing the accuracy. 训练完成后,将输出两种模型格式。Once training is finished, two model formats are output. 其中之一是模型的 .pb 版本,另一种则是模型的 .zip ML.NET 序列化版本。One of them is the .pb version of the model and the other is the .zip ML.NET serialized version of the model. 如果在 ML.NET 支持的环境中操作,建议使用模型的 .zip 版本。When working in environments supported by ML.NET, it is recommended to use the .zip version of the model. 但如果在不支持 ML.NET 的环境中操作,则可以选择使用 .pb 版本。However, in environments where ML.NET is not supported, you have the option of using the .pb version.

了解预先训练的模型Understand the pretrained model

本教程中使用的预先训练模型是残差网络 (ResNet) v2 模型的 101 层变体。The pretrained model used in this tutorial is the 101-layer variant of the Residual Network (ResNet) v2 model. 原始模型经过训练,可以将图像分为一千个类别。The original model is trained to classify images into a thousand categories. 此模型将大小为 224 x 224 的图像作为输入,并输出其训练的每个类的类概率。The model takes as input an image of size 224 x 224 and outputs the class probabilities for each of the classes it's trained on. 此模型的一部分用于使用自定义图像在两个类之间进行预测来训练新模型。Part of this model is used to train a new model using custom images to make predictions between two classes.

创建控制台应用程序Create console application

在大致了解了迁移学习和图像分类 API 后,现在可以构建应用程序。Now that you have a general understanding of transfer learning and the Image Classification API, it's time to build the application.

  1. 创建名为“DeepLearning_ImageClassification_Binary”的 C# .NET Core 控制台应用程序 。Create a C# .NET Core Console Application called "DeepLearning_ImageClassification_Binary".
  2. 安装 Microsoft.ML 版本 1.4.0 NuGet 包:Install the Microsoft.ML version 1.4.0 NuGet Package:
    1. 在“解决方案资源管理器”中,右键单击项目,然后选择“管理 NuGet 包” 。In Solution Explorer, right-click on your project and select Manage NuGet Packages.
    2. 选择“nuget.org”作为“包源”。Choose "nuget.org" as the Package source.
    3. 选择“浏览”选项卡 。Select the Browse tab.
    4. 选中“包括预发行版”复选框 。Check the Include prerelease checkbox.
    5. 搜索 Microsoft.ML 。Search for Microsoft.ML.
    6. 选择“安装”按钮 。Select the Install button.
    7. 选择“预览更改” 对话框上的“确定” 按钮,如果你同意所列包的许可条款,则选择“接受许可” 对话框上的“我接受” 按钮。Select the OK button on the Preview Changes dialog and then select the I Accept button on the License Acceptance dialog if you agree with the license terms for the packages listed.
    8. 为 Microsoft.ML.Vision 版本 1.4.0 、SciSharp.TensorFlow.Redist 版本 1.15.0 和 Microsoft.ML.ImageAnalytics 版本 1.4.0 NuGet 包重复上述步骤。Repeat these steps for the Microsoft.ML.Vision version 1.4.0, SciSharp.TensorFlow.Redist version 1.15.0, and Microsoft.ML.ImageAnalytics version 1.4.0 NuGet packages.

准备和了解数据Prepare and understand the data

备注

本教程的数据集来自由 Maguire, Marc、Dorafshan, Sattar 和 Thomas, Robert J. 共同撰写的“SDNET2018:机器学习应用程序的混凝土裂缝图像数据集”(2018)。The datasets for this tutorial are from Maguire, Marc; Dorafshan, Sattar; and Thomas, Robert J., "SDNET2018: A concrete crack image dataset for machine learning applications" (2018). 浏览所有数据集。Browse all Datasets. 论文 48。Paper 48. https://digitalcommons.usu.edu/all_datasets/48https://digitalcommons.usu.edu/all_datasets/48

SDNET2018 是一个图像数据集,其中包含有裂缝和无裂缝混凝土结构(桥面、墙壁和人行道)的注释。SDNET2018 is an image dataset that contains annotations for cracked and non-cracked concrete structures (bridge decks, walls, and pavement).

SDNET2018 数据集桥面示例

这些数据组织为三个子目录:The data is organized in three subdirectories:

  • D 包含桥面图像D contains bridge deck images
  • P 包含人行道图像P contains pavement images
  • W 包含墙壁图像W contains wall images

这些子目录中的每个子目录都包含两个额外的带前缀的子目录:Each of these subdirectories contains two additional prefixed subdirectories:

  • C 是用于有裂缝的表面的前缀C is the prefix used for cracked surfaces
  • U 是用于无裂缝的表面的前缀U is the prefix used for uncracked surfaces

本教程仅使用桥面图像。In this tutorial, only bridge deck images are used.

  1. 下载数据集并解压缩。Download the dataset and unzip.
  2. 在项目中创建名为“资产”的目录,用于保存数据集文件。Create a directory named "assets" in your project to save your dataset files.
  3. 将 CD 与 UD 子目录从最近解压缩的目录复制到“资产”目录 。Copy the CD and UD subdirectories from the recently unzipped directory to the assets directory.

创建输入和输出类Create input and output classes

  1. 打开 Program.cs 文件,并将文件顶部的现有 using 语句替换为以下内容 :Open the Program.cs file and replace the existing using statements at the top of the file with the following:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.IO;
    using Microsoft.ML;
    using static Microsoft.ML.DataOperationsCatalog;
    using Microsoft.ML.Vision;
    
  2. 在 Program.cs 的 Program 类下,创建一个名为 ImageData 的类 。Below the Program class in Program.cs, create a class called ImageData. 此类用于表示最初加载的数据。This class is used to represent the initially loaded data.

    class ImageData
    {
        public string ImagePath { get; set; }
    
        public string Label { get; set; }
    }
    

    ImageData 包含以下属性:ImageData contains the following properties:

    • ImagePath 是存储图像的完全限定的路径。ImagePath is the fully qualified path where the image is stored.
    • Label 是图像所属的类别。Label is the category the image belongs to. 这是要预测的值。This is the value to predict.
  3. 为输入和输出数据创建类Create classes for your input and output data

    1. ImageData 类下,在名为 ModelInput 的新类中定义输入数据的架构。Below the ImageData class, define the schema of your input data in a new class called ModelInput.

      class ModelInput
      {
          public byte[] Image { get; set; }
          
          public UInt32 LabelAsKey { get; set; }
      
          public string ImagePath { get; set; }
      
          public string Label { get; set; }
      }
      

      ModelInput 包含以下属性:ModelInput contains the following properties:

      • Image 是图像的 byte[] 表示形式。Image is the byte[] representation of the image. 模型需要此类型的图像数据以供训练。The model expects image data to be of this type for training.
      • LabelAsKeyLabel 的数值表示形式。LabelAsKey is the numerical representation of the Label.
      • ImagePath 是存储图像的完全限定的路径。ImagePath is the fully qualified path where the image is stored.
      • Label 是图像所属的类别。Label is the category the image belongs to. 这是要预测的值。This is the value to predict.

      ImageLabelAsKey 用于训练模型和进行预测。Only Image and LabelAsKey are used to train the model and make predictions. 已保留 ImagePathLabel 属性,以方便访问原始图像文件名称和类别。The ImagePath and Label properties are kept for convenience to access the original image file name and category.

    2. 然后,在 ModelInput 类下,在名为 ModelOutput 的新类中定义输出数据的架构。Then, below the ModelInput class, define the schema of your output data in a new class called ModelOutput.

      class ModelOutput
      {
          public string ImagePath { get; set; }
      
          public string Label { get; set; }
      
          public string PredictedLabel { get; set; }
      }
      

      ModelOutput 包含以下属性:ModelOutput contains the following properties:

      • ImagePath 是存储图像的完全限定的路径。ImagePath is the fully qualified path where the image is stored.
      • Label 是图像所属的原始类别。Label is the original category the image belongs to. 这是要预测的值。This is the value to predict.
      • PredictedLabel 是模型预测的值。PredictedLabel is the value predicted by the model.

      ModelInput 类似,只需 PredictedLabel 即可进行预测,因为它包含由模型作出的预测。Similar to ModelInput, only the PredictedLabel is required to make predictions since it contains the prediction made by the model. 已保留 ImagePathLabel 属性,以方便访问原始图像文件名称和类别。The ImagePath and Label properties are retained for convenience to access the original image file name and category.

创建工作区目录Create workspace directory

当训练和验证数据不经常更改时,最佳做法是缓存计算的瓶颈值以便进行进一步的运行。When training and validation data do not change often, it is good practice to cache the computed bottleneck values for further runs.

  1. 在你的项目中,创建名为“工作区” 的新目录,以存储计算的瓶颈值和模型的 .pb 版本。In your project, create a new directory called workspace to store the computed bottleneck values and .pb version of the model.

定义路径并初始化变量Define paths and initialize variables

  1. Main 方法中,定义资产的位置、计算的瓶颈值和模型的 .pb 版本。Inside the Main method, define the location of your assets, computed bottleneck values and .pb version of the model.

    var projectDirectory = Path.GetFullPath(Path.Combine(AppContext.BaseDirectory, "../../../"));
    var workspaceRelativePath = Path.Combine(projectDirectory, "workspace");
    var assetsRelativePath = Path.Combine(projectDirectory, "assets");
    
  2. 使用 MLContext 的新实例初始化 mlContext 变量。Initialize the mlContext variable with a new instance of MLContext.

    MLContext mlContext = new MLContext();
    

    执行所有 ML.NET 操作都是从 MLContext 类开始,初始化 mlContext 可创建一个新的 ML.NET 环境,可在模型创建工作流对象之间共享该环境。The MLContext class is a starting point for all ML.NET operations, and initializing mlContext creates a new ML.NET environment that can be shared across the model creation workflow objects. 从概念上讲,它与实体框架中的 DBContext 类似。It's similar, conceptually, to DBContext in Entity Framework.

加载数据Load the data

创建数据加载实用工具方法Create data loading utility method

图像存储在两个子目录中。The images are stored in two subdirectories. 在加载数据之前,需要将数据的格式设置为 ImageData 对象的列表。Before loading the data, it needs to be formatted into a list of ImageData objects. 为此,在 Main 方法下创建 LoadImagesFromDirectory 方法。To do so, create the LoadImagesFromDirectory method below the Main method.

public static IEnumerable<ImageData> LoadImagesFromDirectory(string folder, bool useFolderNameAsLabel = true)
{

}
  1. LoadImagesDirectory 中添加以下代码,以便从子目录获取所有文件路径:Inside the LoadImagesDirectory add the following code to get all of the file paths from the subdirectories:

    var files = Directory.GetFiles(folder, "*",
        searchOption: SearchOption.AllDirectories);
    
  2. 然后,使用 foreach 语句循环访问每个文件。Then, iterate through each of the files using a foreach statement.

    foreach (var file in files)
    {
    
    }
    
  3. foreach 语句中,检查文件扩展名是否受支持。Inside the foreach statement, check that the file extensions are supported. 图像分类 API 支持 JPEG 和 PNG 格式。The Image Classification API supports JPEG and PNG formats.

    if ((Path.GetExtension(file) != ".jpg") && (Path.GetExtension(file) != ".png"))
        continue;
    
  4. 然后,获取文件的标签。Then, get the label for the file. 如果 useFolderNameAsLabel 参数设置为 true,则保存文件的父级目录用作标签。If the useFolderNameAsLabel parameter is set to true, then the parent directory where the file is saved is used as the label. 否则,标签应为文件名的前缀或文件名本身。Otherwise, it expects the label to be a prefix of the file name or the file name itself.

    var label = Path.GetFileName(file);
    
    if (useFolderNameAsLabel)
        label = Directory.GetParent(file).Name;
    else
    {
        for (int index = 0; index < label.Length; index++)
        {
            if (!char.IsLetter(label[index]))
            {
                label = label.Substring(0, index);
                break;
            }
        }
    }
    
  5. 最后,创建 ModelInput 的新实例。Finally, create a new instance of ModelInput.

    yield return new ImageData()
    {
        ImagePath = file,
        Label = label
    };
    

准备数据Prepare the data

  1. 返回 Main 方法,使用 LoadFromDirectory 实用工具方法获取用于训练的图像列表。Back in the Main method, use the LoadFromDirectory utility method to get the list of images used for training.

    IEnumerable<ImageData> images = LoadImagesFromDirectory(folder: assetsRelativePath, useFolderNameAsLabel: true);
    
  2. 然后,使用 LoadFromEnumerable 方法将图像加载到 IDataViewThen, load the images into an IDataView using the LoadFromEnumerable method.

    IDataView imageData = mlContext.Data.LoadFromEnumerable(images);
    
  3. 按从目录中读取数据的顺序加载数据。The data is loaded in the order it was read from the directories. 使用 ShuffleRows 方法无序播放数据,以便平衡这些数据。To balance the data, shuffle it using the ShuffleRows method.

    IDataView shuffledData = mlContext.Data.ShuffleRows(imageData);
    
  4. 机器学习模型要求输入采用数值格式。Machine learning models expect input to be in numerical format. 因此,在训练之前需要对数据进行一些预处理。Therefore, some preprocessing needs to be done on the data prior to training. 创建一个由 MapValueToKeyLoadRawImageBytes 转换组成的 EstimatorChainCreate an EstimatorChain made up of the MapValueToKey and LoadRawImageBytes transforms. MapValueToKey 转换采用 Label 列中的分类值,将其转换为数值 KeyType 值,并将其存储在名为 LabelAsKey 的新列中。The MapValueToKey transform takes the categorical value in the Label column, converts it to a numerical KeyType value and stores it in a new column called LabelAsKey. LoadImages 采用 ImagePath 列中的值和 imageFolder 参数,以加载用于训练的图像。The LoadImages takes the values from the ImagePath column along with the imageFolder parameter to load images for training.

    var preprocessingPipeline = mlContext.Transforms.Conversion.MapValueToKey(
            inputColumnName: "Label",
            outputColumnName: "LabelAsKey")
        .Append(mlContext.Transforms.LoadRawImageBytes(
            outputColumnName: "Image",
            imageFolder: assetsRelativePath,
            inputColumnName: "ImagePath"));
    
  5. 使用 Fit 方法将数据应用到 preprocessingPipeline EstimatorChain,然后使用 Transform 方法,该方法返回包含预处理数据的 IDataViewUse the Fit method to apply the data to the preprocessingPipeline EstimatorChain followed by the Transform method, which returns an IDataView containing the pre-processed data.

    IDataView preProcessedData = preprocessingPipeline
                        .Fit(shuffledData)
                        .Transform(shuffledData);
    
  6. 要训练模型,具有训练数据集和验证数据集至关重要。To train a model, it's important to have a training dataset as well as a validation dataset. 模型在训练集中进行训练。The model is trained on the training set. 它对不可见数据的预测能力取决于针对验证集的性能。How well it makes predictions on unseen data is measured by the performance against the validation set. 根据该性能的结果,模型会调整其所了解的内容,以进行改进。Based on the results of that performance, the model makes adjustments to what it has learned in an effort to improve. 验证集可以来自拆分原始数据集,也可以来自为此目的而保留的其他源。The validation set can come from either splitting your original dataset or from another source that has already been set aside for this purpose. 在本例中,预先处理的数据集被拆分为训练集、验证集和测试集。In this case, the pre-processed dataset is split into training, validation and test sets.

    TrainTestData trainSplit = mlContext.Data.TrainTestSplit(data: preProcessedData, testFraction: 0.3);
    TrainTestData validationTestSplit = mlContext.Data.TrainTestSplit(trainSplit.TestSet);
    

    上面的代码示例执行两种拆分。The code sample above performs two splits. 首先,拆分预先处理的数据,并将 70% 用于训练,而将剩余的 30% 用于验证。First, the pre-processed data is split and 70% is used for training while the remaining 30% is used for validation. 然后,将此 30% 的验证集进一步拆分为验证集和测试集,其中,90% 用于验证,10% 用于测试。Then, the 30% validation set is further split into validation and test sets where 90% is used for validation and 10% is used for testing.

    考虑这些数据分区用途的一种方法是进行测试。A way to think about the purpose of these data partitions is taking an exam. 在为测试而学习时,可以查看笔记、书籍或其他资源来掌握测试中的概念。When studying for an exam, you review your notes, books, or other resources to get a grasp on the concepts that are on the exam. 这便是训练集的作用。This is what the train set is for. 然后,可以进行模拟测试来验证你的知识。Then, you might take a mock exam to validate your knowledge. 这时验证集便派上了用场。This is where the validation set comes in handy. 你需要在参加实际测试之前,检查是否牢固掌握了概念。You want to check whether you have a good grasp of the concepts before taking the actual exam. 根据这些结果,你可以记下做错的内容或无法充分理解的内容,并在复习以应对实际测试时纳入更改。Based on those results, you take note of what you got wrong or didn't understand well and incorporate your changes as you review for the real exam. 最后,进行测试。Finally, you take the exam. 这便是测试集的作用。This is what the test set is used for. 你从未见过测试题目,现在使用你从训练和验证中学到的内容将你的知识应用到手头上的任务。You've never seen the questions that are on the exam and now use what you learned from training and validation to apply your knowledge to the task at hand.

  7. 为训练、验证和测试数据的分区分配各自的值。Assign the partitions their respective values for the train, validation and test data.

    IDataView trainSet = trainSplit.TrainSet;
    IDataView validationSet = validationTestSplit.TrainSet;
    IDataView testSet = validationTestSplit.TestSet;
    

定义训练管道Define the training pipeline

模型训练包含以下几个步骤。Model training consists of a couple of steps. 首先,使用图像分类 API 来训练模型。First, Image Classification API is used to train the model. 然后,使用 MapKeyToValue 转换将 PredictedLabel 列中的编码标签转换回其原始分类值。Then, the encoded labels in the PredictedLabel column are converted back to their original categorical value using the MapKeyToValue transform.

  1. 创建新变量以存储 ImageClassificationTrainer 的一组必需参数和可选参数。Create a new variable to store a set of required and optional parameters for an ImageClassificationTrainer.

    var classifierOptions = new ImageClassificationTrainer.Options()
    {
        FeatureColumnName = "Image",
        LabelColumnName = "LabelAsKey",
        ValidationSet = validationSet,
        Arch = ImageClassificationTrainer.Architecture.ResnetV2101,
        MetricsCallback = (metrics) => Console.WriteLine(metrics),
        TestOnTrainSet = false,
        ReuseTrainSetBottleneckCachedValues = true,
        ReuseValidationSetBottleneckCachedValues = true,
        WorkspacePath=workspaceRelativePath
    };
    

    ImageClassificationTrainer 使用几个可选参数:An ImageClassificationTrainer takes several optional parameters:

    • FeatureColumnName 是用作模型的输入的列。FeatureColumnName is the column that is used as input for the model.
    • LabelColumnName 是要预测的值的列。LabelColumnName is the column for the value to predict.
    • ValidationSet 是包含验证数据的 IDataViewValidationSet is the IDataView containing the validation data.
    • Arch 定义要使用的预先训练的模型体系结构。Arch defines which of the pretrained model architectures to use. 本教程使用 ResNetv2 模型的 101 层变体。This tutorial uses the 101-layer variant of the ResNetv2 model.
    • MetricsCallback 绑定函数,以便在训练期间跟踪进度。MetricsCallback binds a function to track the progress during training.
    • TestOnTrainSet 告知模型在不存在验证集时根据训练集度量性能。TestOnTrainSet tells the model to measure performance against the training set when no validation set is present.
    • ReuseTrainSetBottleneckCachedValues 告知模型是否在后续运行中使用瓶颈阶段的缓存值。ReuseTrainSetBottleneckCachedValues tells the model whether to use the cached values from the bottleneck phase in subsequent runs. 瓶颈阶段是在第一次执行时需要大量计算的一次性直通计算。The bottleneck phase is a one-time pass-through computation that is computationally intensive the first time it is performed. 如果训练数据未发生更改,并且你想要使用不同数量的学习周期或批大小进行试验,则使用缓存的值可以显著减少训练模型所需的时间量。If the training data does not change and you want to experiment using a different number of epochs or batch size, using the cached values significantly reduces the amount of time required to train a model.
    • ReuseValidationSetBottleneckCachedValuesReuseTrainSetBottleneckCachedValues 类似,只是在本例中,它适用于验证数据集。ReuseValidationSetBottleneckCachedValues is similar to ReuseTrainSetBottleneckCachedValues only that in this case it's for the validation dataset.
    • WorkspacePath 定义目录,在该目录中存储计算的瓶颈值和模型的 .pb 版本。WorkspacePath defines the directory where to store the computed bottleneck values and .pb version of the model.
  2. 定义包含 mapLabelEstimatorImageClassificationTrainerEstimatorChain 训练管道。Define the EstimatorChain training pipeline that consists of both the mapLabelEstimator and the ImageClassificationTrainer.

    var trainingPipeline = mlContext.MulticlassClassification.Trainers.ImageClassification(classifierOptions)
        .Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"));
    
  3. 使用 Fit 方法训练模型。Use the Fit method to train your model.

    ITransformer trainedModel = trainingPipeline.Fit(trainSet);
    

使用模型Use the model

训练模型后,现在可以使用它来对图像进行分类。Now that you have trained your model, it's time to use it to classify images.

Main 方法下,创建一个名为 OutputPrediction 的新实用工具方法,以便在控制台中显示预测信息。Below the Main method, create a new utility method called OutputPrediction to display prediction information in the console.

private static void OutputPrediction(ModelOutput prediction)
{
    string imageName = Path.GetFileName(prediction.ImagePath);
    Console.WriteLine($"Image: {imageName} | Actual Value: {prediction.Label} | Predicted Value: {prediction.PredictedLabel}");
}

对单个图像进行分类Classify a single image

  1. 将名为 ClassifySingleImage 的新方法添加到 Main 方法下,以进行并输出单个图像预测。Add a new method called ClassifySingleImage below the Main method to make and output a single image prediction.

    public static void ClassifySingleImage(MLContext mlContext, IDataView data, ITransformer trainedModel)
    {
    
    }
    
  2. ClassifySingleImage 方法中创建一个 PredictionEngineCreate a PredictionEngine inside the ClassifySingleImage method. PredictionEngine 是一种方便的 API,它允许传入并对单个数据实例执行预测。The PredictionEngine is a convenience API, which allows you to pass in and then perform a prediction on a single instance of data.

    PredictionEngine<ModelInput, ModelOutput> predictionEngine = mlContext.Model.CreatePredictionEngine<ModelInput, ModelOutput>(trainedModel);
    
  3. 使用 CreateEnumerable 方法将 data IDataView 转换为 IEnumerable,以访问单个 ModelInput 实例,然后获取第一个观察值。To access a single ModelInput instance, convert the data IDataView into an IEnumerable using the CreateEnumerable method and then get the first observation.

    ModelInput image = mlContext.Data.CreateEnumerable<ModelInput>(data,reuseRowObject:true).First();
    
  4. 使用 Predict 方法对图像进行分类。Use the Predict method to classify the image.

    ModelOutput prediction = predictionEngine.Predict(image);
    
  5. 使用 OutputPrediction 方法将预测输出到控制台。Output the prediction to the console with the OutputPrediction method.

    Console.WriteLine("Classifying single image");
    OutputPrediction(prediction);
    
  6. Main 方法中,使用图像的测试集调用 ClassifySingleImageInside the Main method, call ClassifySingleImage using the test set of images.

    ClassifySingleImage(mlContext, testSet, trainedModel);
    

对多个图像进行分类Classify multiple images

  1. 将名为 ClassifyImages 的新方法添加到 ClassifySingleImage 方法下,以进行并输出多个图像预测。Add a new method called ClassifyImages below the ClassifySingleImage method to make and output multiple image predictions.

    public static void ClassifyImages(MLContext mlContext, IDataView data, ITransformer trainedModel)
    {
    
    }
    
  2. 使用 Transform 方法创建包含预测的 IDataViewCreate an IDataView containing the predictions by using the Transform method. 将以下代码添加到 ClassifyImages 方法中。Add the following code inside the ClassifyImages method.

    IDataView predictionData = trainedModel.Transform(data);
    
  3. 使用 CreateEnumerable 方法将 predictionData IDataView 转换为 IEnumerable,以循环访问预测,然后获取前 10 个观察值。In order to iterate over the predictions, convert the predictionData IDataView into an IEnumerable using the CreateEnumerable method and then get the first 10 observations.

    IEnumerable<ModelOutput> predictions = mlContext.Data.CreateEnumerable<ModelOutput>(predictionData, reuseRowObject: true).Take(10);
    
  4. 循环访问并输出预测的原始标签和预测标签。Iterate and output the original and predicted labels for the predictions.

    Console.WriteLine("Classifying multiple images");
    foreach (var prediction in predictions)
    {
        OutputPrediction(prediction);
    }
    
  5. 最后,在 Main 方法中,使用图像的测试集调用 ClassifyImagesFinally, inside the Main method, call ClassifyImages using the test set of images.

    ClassifyImages(mlContext, testSet, trainedModel);
    

运行此应用程序Run the application

运行控制台应用。Run your console app. 输出应如下所示。The output should be similar to that below. 你可能会看到警告或处理消息,为清楚起见,这些消息已从以下结果中删除。You may see warnings or processing messages, but these messages have been removed from the following results for clarity. 为简洁起见,输出已进行压缩。For brevity, the output has been condensed.

瓶颈阶段Bottleneck phase

不会为图像名称打印任何值,因为图像已作为 byte[] 加载,因此没有要显示的图像名称。No value is printed for the image name because the images are loaded as a byte[] therefore there is no image name to display.

Phase: Bottleneck Computation, Dataset used:      Train, Image Index: 279
Phase: Bottleneck Computation, Dataset used:      Train, Image Index: 280
Phase: Bottleneck Computation, Dataset used: Validation, Image Index:   1
Phase: Bottleneck Computation, Dataset used: Validation, Image Index:   2

训练阶段Training phase

Phase: Training, Dataset used: Validation, Batch Processed Count:   6, Epoch:  21, Accuracy:  0.6797619
Phase: Training, Dataset used: Validation, Batch Processed Count:   6, Epoch:  22, Accuracy:  0.7642857
Phase: Training, Dataset used: Validation, Batch Processed Count:   6, Epoch:  23, Accuracy:  0.7916667

对图像输出进行分类Classify images output

Classifying single image
Image: 7001-220.jpg | Actual Value: UD | Predicted Value: UD

Classifying multiple images
Image: 7001-220.jpg | Actual Value: UD | Predicted Value: UD
Image: 7001-163.jpg | Actual Value: UD | Predicted Value: UD
Image: 7001-210.jpg | Actual Value: UD | Predicted Value: UD

检查 7001-220.jpg 图像时,你可以看到它实际上并无裂缝 。Upon inspection of the 7001-220.jpg image, you can see that it in fact is not cracked.

用于预测的 SDNET2018 数据集图像

祝贺你!Congratulations! 现已成功构建了用于对图像进行分类的深度学习模型。You've now successfully built a deep learning model for classifying images.

改进模型Improve the model

如果你对模型的结果不满意,则可以尝试使用以下方法来改进其性能:If you're not satisfied with the results of your model, you can try to improve its performance by trying some of the following approaches:

  • 更多数据:模型学习的示例越多,其性能就越好。More Data: The more examples a model learns from, the better it performs. 下载完整的 SDNET2018 数据集并将其用于训练。Download the full SDNET2018 dataset and use it to train.
  • 增加数据:向数据添加多样性的一种常见方法是通过拍摄图像并应用不同转换(旋转、翻转、移动、裁剪)来增加数据。Augment the data: A common technique to add variety to the data is to augment the data by taking an image and applying different transforms (rotate, flip, shift, crop). 这为模型添加了更多不同的示例以供学习。This adds more varied examples for the model to learn from.
  • 训练更长时间:训练的时间越长,模型的调整效果就越好。Train for a longer time: The longer you train, the more tuned the model will be. 增加学习周期数可以提高模型的性能。Increasing the number of epochs may improve the performance of your model.
  • 试验超参数:除了在本教程中使用的参数之外,还可以对其他参数进行调整,以便潜在地提高性能。Experiment with the hyper-parameters: In addition to the parameters used in this tutorial, other parameters can be tuned to potentially improve performance. 更改学习率(确定在每个学习周期后对模型所做的更新量)可以提高性能。Changing the learning rate, which determines the magnitude of updates made to the model after each epoch may improve performance.
  • 使用其他模型体系结构:根据数据的外观,可以最好地了解其功能的模型可能会有所不同。Use a different model architecture: Depending on what your data looks like, the model that can best learn its features may differ. 如果你对模型的性能不满意,请尝试更改体系结构。If you're not satisfied with the performance of your model, try changing the architecture.

其他资源Additional Resources

后续步骤Next steps

在本教程中,你已了解如何使用迁移学习、预先训练的图像分类 TensorFlow 模型和 ML.NET 图像分类 API 构建自定义深度学习模型,以将混凝土表面的图像分类为有裂缝或无裂缝。In this tutorial, you learned how to build a custom deep learning model using transfer learning, a pretrained image classification TensorFlow model and the ML.NET Image Classification API to classify images of concrete surfaces as cracked or uncracked.

进入下一教程了解详细信息。Advance to the next tutorial to learn more.