Tutorial: Generate an ML.NET image classification model from a pre-trained TensorFlow model

Learn how to transfer the knowledge from an existing TensorFlow model into a new ML.NET image classification model.

The TensorFlow model was trained to classify images into a thousand categories. The ML.NET model makes use of part of the TensorFlow model in its pipeline to train a model to classify images into 3 categories.

Training an Image Classification model from scratch requires setting millions of parameters, a ton of labeled training data and a vast amount of compute resources (hundreds of GPU hours). While not as effective as training a custom model from scratch, transfer learning allows you to shortcut this process by working with thousands of images vs. millions of labeled images and build a customized model fairly quickly (within an hour on a machine without a GPU). This tutorial scales that process down even further, using only a dozen training images.

In this tutorial, you learn how to:

• Understand the problem
• Incorporate the pre-trained TensorFlow model into the ML.NET pipeline
• Train and evaluate the ML.NET model
• Classify a test image

You can find the source code for this tutorial at the dotnet/samples repository. Note that by default, the .NET project configuration for this tutorial targets .NET core 2.2.

What is transfer learning?

Transfer learning is the process of using knowledge gained while solving one problem and applying it to a different but related problem.

For this tutorial, you use part of a TensorFlow model - trained to classify images into a thousand categories - in an ML.NET model that classifies images into 3 categories.

Select the right machine learning task

Deep learning

Deep learning is a subset of Machine Learning, which is revolutionizing areas like Computer Vision and Speech Recognition.

Deep learning models are trained by using large sets of labeled data and neural networks that contain multiple learning layers. Deep learning:

• Performs better on some tasks like Computer Vision.
• Requires huge amounts of training data.

Image Classification is a common Machine Learning task that allows us to automatically classify images into categories such as:

• Detecting a human face in an image or not.
• Detecting cats vs. dogs.

Or as in the following images, determining if an image is a(n) food, toy, or appliance:

Note

The preceding images belong to Wikimedia Commons and are attributed as follows:

The Inception model is trained to classify images into a thousand categories, but for this tutorial, you need to classify images in a smaller category set, and only those categories. Enter the transfer part of transfer learning. You can transfer the Inception model's ability to recognize and classify images to the new limited categories of your custom image classifier.

• Food
• Toy
• Appliance

This tutorial uses the TensorFlow Inception model deep learning model, a popular image recognition model trained on the ImageNet dataset. The TensorFlow model classifies entire images into a thousand classes, such as “Umbrella”, “Jersey”, and “Dishwasher”.

Because the Inception model has already been pre trained on thousands of different images, internally it contains the image features needed for image identification. We can make use of these internal image features in the model to train a new model with far fewer classes.

As shown in the following diagram, you add a reference to the ML.NET NuGet packages in your .NET Core or .NET Framework applications. Under the covers, ML.NET includes and references the native TensorFlow library that allows you to write code that loads an existing trained TensorFlow model file.

Multiclass classification

After using the TensorFlow inception model to extract features suitable as input for a classical machine learning algorithm, we add an ML.NET multi-class classifier.

The specific trainer used in this case is the multinomial logistic regression algorithm.

The algorithm implemented by this trainer performs well on problems with a large number of features, which is the case for a deep learning model operating on image data.

Data

There are two data sources: the .tsv file, and the image files. The tags.tsv file contains two columns: the first one is defined as ImagePath and the second one is the Label corresponding to the image. The following example file doesn't have a header row, and looks like this:

broccoli.jpg	food
pizza.jpg	food
pizza2.jpg	food
teddy2.jpg	toy
teddy3.jpg	toy
teddy4.jpg	toy
toaster.jpg	appliance
toaster2.png	appliance


The training and testing images are located in the assets folders that you'll download in a zip file. These images belong to Wikimedia Commons.

Wikimedia Commons, the free media repository. Retrieved 10:48, October 17, 2018 from: https://commons.wikimedia.org/wiki/Pizza https://commons.wikimedia.org/wiki/Toaster https://commons.wikimedia.org/wiki/Teddy_bear

Setup

Create a project

1. Create a .NET Core Console Application called "TransferLearningTF".

2. Install the Microsoft.ML NuGet Package:

• In Solution Explorer, right-click on your project and select Manage NuGet Packages.
• Choose "nuget.org" as the Package source, select the Browse tab, search for Microsoft.ML.
• Click on the Version drop-down, select the 1.4.0 package in the list, and select the Install button.
• Select the OK button on the Preview Changes dialog.
• Select the I Accept button on the License Acceptance dialog if you agree with the license terms for the packages listed.
• Repeat these steps for Microsoft.ML.ImageAnalytics v1.4.0, SciSharp.TensorFlow.Redist v1.15.0 and Microsoft.ML.TensorFlow v1.4.0.

2. Copy the assets directory into your TransferLearningTF project directory. This directory and its subdirectories contain the data and support files (except for the Inception model, which you'll download and add in the next step) needed for this tutorial.

4. Copy the contents of the inception5h directory just unzipped into your TransferLearningTF project assets/inception directory. This directory contains the model and additional support files needed for this tutorial, as shown in the following image:

5. In Solution Explorer, right-click each of the files in the asset directory and subdirectories and select Properties. Under Advanced, change the value of Copy to Output Directory to Copy if newer.

Create classes and define paths

1. Add the following additional using statements to the top of the Program.cs file:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using Microsoft.ML;
using Microsoft.ML.Data;

2. Add the following code to the line right above the Main method to specify the asset paths:

static readonly string _assetsPath = Path.Combine(Environment.CurrentDirectory, "assets");
static readonly string _imagesFolder = Path.Combine(_assetsPath, "images");
static readonly string _trainTagsTsv = Path.Combine(_imagesFolder, "tags.tsv");
static readonly string _testTagsTsv = Path.Combine(_imagesFolder, "test-tags.tsv");
static readonly string _predictSingleImage = Path.Combine(_imagesFolder, "toaster3.jpg");
static readonly string _inceptionTensorFlowModel = Path.Combine(_assetsPath, "inception", "tensorflow_inception_graph.pb");

3. Create classes for your input data, and predictions.

public class ImageData
{
public string ImagePath;

public string Label;
}


ImageData is the input image data class and has the following String fields:

• ImagePath contains the image file name.
• Label contains a value for the image label.
4. Add a new class to your project for ImagePrediction:

public class ImagePrediction : ImageData
{
public float[] Score;

public string PredictedLabelValue;
}


ImagePrediction is the image prediction class and has the following fields:

• Score contains the confidence percentage for a given image classification.
• PredictedLabelValue contains a value for the predicted image classification label.

ImagePrediction is the class used for prediction after the model has been trained. It has a string (ImagePath) for the image path. The Label is used to reuse and train the model. The PredictedLabelValue is used during prediction and evaluation. For evaluation, an input with training data, the predicted values, and the model are used.

Initialize variables in Main

1. Initialize the mlContext variable with a new instance of MLContext. Replace the Console.WriteLine("Hello World!") line with the following code in the Main method:

MLContext mlContext = new MLContext();


The MLContext class is a starting point for all ML.NET operations, and initializing mlContext creates a new ML.NET environment that can be shared across the model creation workflow objects. It's similar, conceptually, to DBContext in Entity Framework.

Create a struct for Inception model parameters

1. The Inception model has several parameters you need to pass in. Create a struct to map the parameter values to friendly names with the following code, just after the Main() method:

private struct InceptionSettings
{
public const int ImageHeight = 224;
public const int ImageWidth = 224;
public const float Mean = 117;
public const float Scale = 1;
public const bool ChannelsLast = true;
}


Create a display utility method

Since you'll display the image data and the related predictions more than once, create a display utility method to handle displaying the image and prediction results.

1. Create the DisplayResults() method, just after the InceptionSettings struct, using the following code:

private static void DisplayResults(IEnumerable<ImagePrediction> imagePredictionData)
{

}

2. Fill in the body of the DisplayResults method:

foreach (ImagePrediction prediction in imagePredictionData)
{
Console.WriteLine($"Image: {Path.GetFileName(prediction.ImagePath)} predicted as: {prediction.PredictedLabelValue} with score: {prediction.Score.Max()} "); }  Create a .tsv file utility method 1. Create the ReadFromTsv() method, just after the DisplayResults() method, using the following code: public static IEnumerable<ImageData> ReadFromTsv(string file, string folder) { }  2. Fill in the body of the ReadFromTsv method: return File.ReadAllLines(file) .Select(line => line.Split('\t')) .Select(line => new ImageData() { ImagePath = Path.Combine(folder, line[0]) });  The code parses through the tags.tsv file to add the file path to the image file name for the ImagePath property and load it and the Label into an ImageData object. Create a method to make a prediction 1. Create the ClassifySingleImage() method, just before the DisplayResults() method, using the following code: public static void ClassifySingleImage(MLContext mlContext, ITransformer model) { }  2. Create an ImageData object that contains the fully qualified path and image file name for the single ImagePath. Add the following code as the next lines in the ClassifySingleImage() method: var imageData = new ImageData() { ImagePath = _predictSingleImage };  3. Make a single prediction, by adding the following code as the next line in the ClassifySingleImage method: // Make prediction function (input = ImageData, output = ImagePrediction) var predictor = mlContext.Model.CreatePredictionEngine<ImageData, ImagePrediction>(model); var prediction = predictor.Predict(imageData);  To get the prediction, use the Predict() method. The PredictionEngine is a convenience API, which allows you to perform a prediction on a single instance of data. PredictionEngine is not thread-safe. It's acceptable to use in single-threaded or prototype environments. For improved performance and thread safety in production environments, use the PredictionEnginePool service, which creates an ObjectPool of PredictionEngine objects for use throughout your application. See this guide on how to use PredictionEnginePool in an ASP.NET Core Web API. Note PredictionEnginePool service extension is currently in preview. 4. Display the prediction result as the next line of code in the ClassifySingleImage() method: Console.WriteLine($"Image: {Path.GetFileName(imageData.ImagePath)} predicted as: {prediction.PredictedLabelValue} with score: {prediction.Score.Max()} ");


Construct the ML.NET model pipeline

An ML.NET model pipeline is a chain of estimators. Note that no execution happens during pipeline construction. The estimator objects are created but not executed.

1. Add a method to generate the model

This method is the heart of the tutorial. It creates a pipeline for the model, and trains the pipeline to produce the ML.NET model. It also evaluates the model against some previously unseen test data.

Create the GenerateModel() method, just after the InceptionSettings struct and just before the DisplayResults() method, using the following code:

public static ITransformer GenerateModel(MLContext mlContext)
{

}

2. Add the estimators to load, resize and extract the pixels from the image data:

IEstimator<ITransformer> pipeline = mlContext.Transforms.LoadImages(outputColumnName: "input", imageFolder: _imagesFolder, inputColumnName: nameof(ImageData.ImagePath))
// The image transforms transform the images into the model's expected format.
.Append(mlContext.Transforms.ResizeImages(outputColumnName: "input", imageWidth: InceptionSettings.ImageWidth, imageHeight: InceptionSettings.ImageHeight, inputColumnName: "input"))
.Append(mlContext.Transforms.ExtractPixels(outputColumnName: "input", interleavePixelColors: InceptionSettings.ChannelsLast, offsetImage: InceptionSettings.Mean))


The image data needs to be processed into the format that the TensorFlow model expects. In this case, the images are loaded into memory, resized to a consistent size, and the pixels are extracted into a numeric vector.

3. Add the estimator to load the TensorFlow model, and score it:

.Append(mlContext.Model.LoadTensorFlowModel(_inceptionTensorFlowModel).
ScoreTensorFlowModel(outputColumnNames: new[] { "softmax2_pre_activation" }, inputColumnNames: new[] { "input" }, addBatchDimensionInput: true))


This stage in the pipeline loads the TensorFlow model into memory, then processes the vector of pixel values through the TensorFlow model network. Applying inputs to a deep learning model, and generating an output using the model, is referred to as Scoring. When using the model in its entirety, scoring makes an inference, or prediction.

In this case, you use all of the TensorFlow model except the last layer, which is the layer that makes the inference. The output of the penultimate layer is labeled softmax_2_preactivation. The output of this layer is effectively a vector of features that characterize the original input images.

This feature vector generated by the TensorFlow model will be used as input to an ML.NET training algorithm.

4. Add the estimator to map the string labels in the training data to integer key values:

.Append(mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: "LabelKey", inputColumnName: "Label"))


The ML.NET trainer that is appended next requires its labels to be in key format rather than arbitrary strings. A key is a number that has a one to one mapping to a string value.

5. Add the ML.NET training algorithm:

.Append(mlContext.MulticlassClassification.Trainers.LbfgsMaximumEntropy(labelColumnName: "LabelKey", featureColumnName: "softmax2_pre_activation"))

6. Add the estimator to map the predicted key value back into a string:

.Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabelValue", "PredictedLabel"))
.AppendCacheCheckpoint(mlContext);


Train the model

1. Load the training data using the LoadFromTextFile wrapper. Add the following code as the next line in the GenerateModel() method:

IDataView trainingData = mlContext.Data.LoadFromTextFile<ImageData>(path:  _trainTagsTsv, hasHeader: false);


Data in ML.NET is represented as an IDataView class. IDataView is a flexible, efficient way of describing tabular data (numeric and text). Data can be loaded from a text file or in real time (for example, SQL database or log files) to an IDataView object.

2. Train the model with the data loaded above:

ITransformer model = pipeline.Fit(trainingData);


The Fit() method trains your model by applying the training dataset to the pipeline.

Evaluate the accuracy of the model

1. Load and transform the test data, by adding the following code to the next line of the GenerateModel method:

IDataView testData = mlContext.Data.LoadFromTextFile<ImageData>(path: _testTagsTsv, hasHeader: false);
IDataView predictions = model.Transform(testData);

// Create an IEnumerable for the predictions for displaying results
IEnumerable<ImagePrediction> imagePredictionData = mlContext.Data.CreateEnumerable<ImagePrediction>(predictions, true);
DisplayResults(imagePredictionData);


There are a few sample images that you can use to evaluate the model. Like the training data, these need to be loaded into an IDataView, so that they can be transformed by the model.

2. Add the following code to the GenerateModel() method to evaluate the model:

MulticlassClassificationMetrics metrics =
mlContext.MulticlassClassification.Evaluate(predictions,
labelColumnName: "LabelKey",
predictedLabelColumnName: "PredictedLabel");


Once you have the prediction set, the Evaluate() method:

• Assesses the model (compares the predicted values with the test dataset labels).
• Returns the model performance metrics.
3. Display the model accuracy metrics

Use the following code to display the metrics, share the results, and then act on them:

Console.WriteLine($"LogLoss is: {metrics.LogLoss}"); Console.WriteLine($"PerClassLogLoss is: {String.Join(" , ", metrics.PerClassLogLoss.Select(c => c.ToString()))}");


The following metrics are evaluated for image classification:

• Log-loss - see Log Loss. You want Log-loss to be as close to zero as possible.
• Per class Log-loss. You want per class Log-loss to be as close to zero as possible.
4. Add the following code to return the trained model as the next line:

return model;


Run the application!

1. Add the call to GenerateModel in the Main method after the creation of the MLContext class:

ITransformer model = GenerateModel(mlContext);

2. Add the call to the ClassifySingleImage() method as the next line of code in the Main method:

ClassifySingleImage(mlContext, model);

3. Run your console app (Ctrl + F5). Your results should be similar to the following output. You may see warnings or processing messages, but these messages have been removed from the following results for clarity.

=============== Training classification model ===============
Image: broccoli2.jpg predicted as: food with score: 0.8955513
Image: pizza3.jpg predicted as: food with score: 0.9667718
Image: teddy6.jpg predicted as: toy with score: 0.9797683
=============== Classification metrics ===============
LogLoss is: 0.0653774699265059
PerClassLogLoss is: 0.110315812569315 , 0.0204391272836966 , 0
=============== Making single image classification ===============
Image: toaster3.jpg predicted as: appliance with score: 0.9646884


Congratulations! You've now successfully built a machine learning model for image classification by applying transfer learning to a TensorFlow model in ML.NET.

You can find the source code for this tutorial at the dotnet/samples repository.

In this tutorial, you learned how to:

• Understand the problem
• Incorporate the pre-trained TensorFlow model into the ML.NET pipeline
• Train and evaluate the ML.NET model
• Classify a test image

Check out the Machine Learning samples GitHub repository to explore an expanded image classification sample.