Quickstart: Create an object detection project with the Custom Vision SDK

This article shows you how to get started using the Custom Vision SDK with C# to build an object detection model. After it's created, you can add tagged regions, upload images, train the project, obtain the project's default prediction endpoint URL, and use the endpoint to programmatically test an image. Use this example as a template for building your own .NET application.

Prerequisites

  • Any edition of Visual Studio 2015 or 2017
  • To use the Custom Vision Service you will need to create Custom Vision Training and Prediction resources in Azure. To do so in the Azure portal, fill out the dialog window on the Create Custom Vision page to create both a Training and Prediction resource.

Get the Custom Vision SDK and sample code

To write a .NET app that uses Custom Vision, you'll need the Custom Vision NuGet packages. These packages are included in the sample project you'll download, but you can access them individually here.

Clone or download the Cognitive Services .NET Samples project. Navigate to the CustomVision/ObjectDetection folder and open ObjectDetection.csproj in Visual Studio.

This Visual Studio project creates a new Custom Vision project named My New Project, which can be accessed through the Custom Vision website. It then uploads images to train and test an object detection model. In this project, the model is trained to detect forks and scissors in images.

Get the training and prediction keys

The project needs a valid set of subscription keys to interact with the service. You can find the items at the Custom Vision website. Sign in with the account associated with the Azure account used to create your Custom Vision resources. On the home page (the page with the option to add a new project), select the gear icon in the upper right. Find your training and prediction resources in the list and expand them. Here you can find your training key, prediction key, and prediction resource ID values. Save these values to a temporary location.

Image of the keys UI

Or, you can obtain these keys and ID from the Azure portal by viewing your Custom Vision Training and Prediction resources and navigating to the Keys tab. There you'll find your training key and prediction key. Navigate to the Properties tab of your Prediction resource to get your prediction resource ID.

Understand the code

Open the Program.cs file and inspect the code. Create environment variables for your training and prediction keys named CUSTOM_VISION_TRAINING_KEY and CUSTOM_VISION_PREDICTION_KEY, respectively. The script will look for these variables.

// Add your training & prediction key from the settings page of the portal
string trainingKey = Environment.GetEnvironmentVariable("CUSTOM_VISION_TRAINING_KEY");
string predictionKey = Environment.GetEnvironmentVariable("CUSTOM_VISION_PREDICTION_KEY");

Also, get your Endpoint URL from the Settings page of the Custom Vision website. Save it to an environment variable called CUSTOM_VISION_ENDPOINT. The script saves a reference to it at the root of your class.

string ENDPOINT = Environment.GetEnvironmentVariable("CUSTOM_VISION_ENDPOINT");

Create a new Custom Vision Service project

This next bit of code creates an object detection project. The created project will show up on the Custom Vision website that you visited earlier. See the CreateProject method to specify other options when you create your project (explained in the Build a detector web portal guide).

// Create the Api, passing in the training key
CustomVisionTrainingClient trainingApi = new CustomVisionTrainingClient()
{
    ApiKey = trainingKey,
    Endpoint = ENDPOINT
};

// Find the object detection domain
var domains = trainingApi.GetDomains();
var objDetectionDomain = domains.FirstOrDefault(d => d.Type == "ObjectDetection");

// Create a new project
Console.WriteLine("Creating new project:");
var project = trainingApi.CreateProject("My New Project", null, objDetectionDomain.Id);

Add tags to the project

// Make two tags in the new project
var forkTag = trainingApi.CreateTag(project.Id, "fork");
var scissorsTag = trainingApi.CreateTag(project.Id, "scissors");

Upload and tag images

When you tag images in object detection projects, you need to specify the region of each tagged object using normalized coordinates. The following code associates each of the sample images with its tagged region.

Note

If you don't have a click-and-drag utility to mark the coordinates of regions, you can use the web UI at Customvision.ai. In this example, the coordinates are already provided.

Dictionary<string, double[]> fileToRegionMap = new Dictionary<string, double[]>()
{
    // FileName, Left, Top, Width, Height
    {"scissors_1", new double[] { 0.4007353, 0.194068655, 0.259803921, 0.6617647 } },
    {"scissors_2", new double[] { 0.426470578, 0.185898721, 0.172794119, 0.5539216 } },
    {"scissors_3", new double[] { 0.289215684, 0.259428144, 0.403186262, 0.421568632 } },
    {"scissors_4", new double[] { 0.343137264, 0.105833367, 0.332107842, 0.8055556 } },
    {"scissors_5", new double[] { 0.3125, 0.09766343, 0.435049027, 0.71405226 } },
    {"scissors_6", new double[] { 0.379901975, 0.24308826, 0.32107842, 0.5718954 } },
    {"scissors_7", new double[] { 0.341911763, 0.20714055, 0.3137255, 0.6356209 } },
    {"scissors_8", new double[] { 0.231617644, 0.08459154, 0.504901946, 0.8480392 } },
    {"scissors_9", new double[] { 0.170343131, 0.332957536, 0.767156839, 0.403594762 } },
    {"scissors_10", new double[] { 0.204656869, 0.120539248, 0.5245098, 0.743464053 } },
    {"scissors_11", new double[] { 0.05514706, 0.159754932, 0.799019635, 0.730392158 } },
    {"scissors_12", new double[] { 0.265931368, 0.169558853, 0.5061275, 0.606209159 } },
    {"scissors_13", new double[] { 0.241421565, 0.184264734, 0.448529422, 0.6830065 } },
    {"scissors_14", new double[] { 0.05759804, 0.05027781, 0.75, 0.882352948 } },
    {"scissors_15", new double[] { 0.191176474, 0.169558853, 0.6936275, 0.6748366 } },
    {"scissors_16", new double[] { 0.1004902, 0.279036, 0.6911765, 0.477124184 } },
    {"scissors_17", new double[] { 0.2720588, 0.131977156, 0.4987745, 0.6911765 } },
    {"scissors_18", new double[] { 0.180147052, 0.112369314, 0.6262255, 0.6666667 } },
    {"scissors_19", new double[] { 0.333333343, 0.0274019931, 0.443627447, 0.852941155 } },
    {"scissors_20", new double[] { 0.158088237, 0.04047389, 0.6691176, 0.843137264 } },
    {"fork_1", new double[] { 0.145833328, 0.3509314, 0.5894608, 0.238562092 } },
    {"fork_2", new double[] { 0.294117659, 0.216944471, 0.534313738, 0.5980392 } },
    {"fork_3", new double[] { 0.09191177, 0.0682516545, 0.757352948, 0.6143791 } },
    {"fork_4", new double[] { 0.254901975, 0.185898721, 0.5232843, 0.594771266 } },
    {"fork_5", new double[] { 0.2365196, 0.128709182, 0.5845588, 0.71405226 } },
    {"fork_6", new double[] { 0.115196079, 0.133611143, 0.676470637, 0.6993464 } },
    {"fork_7", new double[] { 0.164215669, 0.31008172, 0.767156839, 0.410130739 } },
    {"fork_8", new double[] { 0.118872553, 0.318251669, 0.817401946, 0.225490168 } },
    {"fork_9", new double[] { 0.18259804, 0.2136765, 0.6335784, 0.643790841 } },
    {"fork_10", new double[] { 0.05269608, 0.282303959, 0.8088235, 0.452614367 } },
    {"fork_11", new double[] { 0.05759804, 0.0894935, 0.9007353, 0.3251634 } },
    {"fork_12", new double[] { 0.3345588, 0.07315363, 0.375, 0.9150327 } },
    {"fork_13", new double[] { 0.269607842, 0.194068655, 0.4093137, 0.6732026 } },
    {"fork_14", new double[] { 0.143382356, 0.218578458, 0.7977941, 0.295751631 } },
    {"fork_15", new double[] { 0.19240196, 0.0633497, 0.5710784, 0.8398692 } },
    {"fork_16", new double[] { 0.140931368, 0.480016381, 0.6838235, 0.240196079 } },
    {"fork_17", new double[] { 0.305147052, 0.2512582, 0.4791667, 0.5408496 } },
    {"fork_18", new double[] { 0.234068632, 0.445702642, 0.6127451, 0.344771236 } },
    {"fork_19", new double[] { 0.219362751, 0.141781077, 0.5919118, 0.6683006 } },
    {"fork_20", new double[] { 0.180147052, 0.239820287, 0.6887255, 0.235294119 } }
};

Then, this map of associations is used to upload each sample image with its region coordinates. You can upload up to 64 images in a single batch.

// Add all images for fork
var imagePath = Path.Combine("Images", "fork");
var imageFileEntries = new List<ImageFileCreateEntry>();
foreach (var fileName in Directory.EnumerateFiles(imagePath))
{
    var region = fileToRegionMap[Path.GetFileNameWithoutExtension(fileName)];
    imageFileEntries.Add(new ImageFileCreateEntry(fileName, File.ReadAllBytes(fileName), null, new List<Region>(new Region[] { new Region(forkTag.Id, region[0], region[1], region[2], region[3]) })));
}
trainingApi.CreateImagesFromFiles(project.Id, new ImageFileCreateBatch(imageFileEntries));

// Add all images for scissors
imagePath = Path.Combine("Images", "scissors");
imageFileEntries = new List<ImageFileCreateEntry>();
foreach (var fileName in Directory.EnumerateFiles(imagePath))
{
    var region = fileToRegionMap[Path.GetFileNameWithoutExtension(fileName)];
    imageFileEntries.Add(new ImageFileCreateEntry(fileName, File.ReadAllBytes(fileName), null, new List<Region>(new Region[] { new Region(scissorsTag.Id, region[0], region[1], region[2], region[3]) })));
}
trainingApi.CreateImagesFromFiles(project.Id, new ImageFileCreateBatch(imageFileEntries));

At this point, you've uploaded all the samples images and tagged each one (fork or scissors) with an associated pixel rectangle.

Train the project

This code creates the first training iteration in the project.

// Now there are images with tags start training the project
Console.WriteLine("\tTraining");
var iteration = trainingApi.TrainProject(project.Id);

// The returned iteration will be in progress, and can be queried periodically to see when it has completed
while (iteration.Status == "Training")
{
    Thread.Sleep(1000);

    // Re-query the iteration to get its updated status
    iteration = trainingApi.GetIteration(project.Id, iteration.Id);
}

Publish the current iteration

The name given to the published iteration can be used to send prediction requests. An iteration is not available in the prediction endpoint until it's published.

// The iteration is now trained. Publish it to the prediction end point.
var publishedModelName = "toolModel";
var predictionResourceId = "<target prediction resource ID>";
trainingApi.PublishIteration(project.Id, iteration.Id, publishedModelName, predictionResourceId);
Console.WriteLine("Done!\n");

Create a prediction endpoint

// Create a prediction endpoint, passing in the obtained prediction key
CustomVisionPredictionClient endpoint = new CustomVisionPredictionClient()
{
    ApiKey = predictionKey,
    Endpoint = ENDPOINT
};

Use the prediction endpoint

This part of the script loads the test image, queries the model endpoint, and outputs prediction data to the console.

// Make a prediction against the new project
Console.WriteLine("Making a prediction:");
var imageFile = Path.Combine("Images", "test", "test_image.jpg");
using (var stream = File.OpenRead(imageFile))
{
    var result = endpoint.DetectImage(project.Id, publishedModelName, File.OpenRead(imageFile));

    // Loop over each prediction and write out the results
    foreach (var c in result.Predictions)
    {
        Console.WriteLine($"\t{c.TagName}: {c.Probability:P1} [ {c.BoundingBox.Left}, {c.BoundingBox.Top}, {c.BoundingBox.Width}, {c.BoundingBox.Height} ]");
    }
}
Console.ReadKey();

Run the application

As the application runs, it should open a console window and write the following output:

Creating new project:
        Training
Done!

Making a prediction:
        fork: 98.2% [ 0.111609578, 0.184719115, 0.6607002, 0.6637112 ]
        scissors: 1.2% [ 0.112389535, 0.119195729, 0.658031344, 0.7023591 ]

You can then verify that the test image (found in Images/Test/) is tagged appropriately and that the region of detection is correct. At this point, you can press any key to exit the application.

Clean up resources

If you wish to implement your own object detection project (or try an image classification project instead), you may want to delete the fork/scissors detection project from this example. A free trial allows for two Custom Vision projects.

On the Custom Vision website, navigate to Projects and select the trash can under My New Project.

Screenshot of a panel labeled My New Project with a trash can icon

Next steps

Now you've seen how to do every step of the object detection process in code. This sample executes a single training iteration, but often you'll need to train and test your model multiple times in order to make it more accurate. The following guide deals with image classification, but its principles are similar to object detection.

This article provides information and sample code to help you get started using the Custom Vision SDK with Go to build an object detection model. After it's created, you can add tagged regions, upload images, train the project, obtain the project's published prediction endpoint URL, and use the endpoint to programmatically test an image. Use this example as a template for building your own Go application.

Prerequisites

  • Go 1.8+
  • To use the Custom Vision Service you will need to create Custom Vision Training and Prediction resources in Azure. To do so in the Azure portal, fill out the dialog window on the Create Custom Vision page to create both a Training and Prediction resource.

Install the Custom Vision SDK

To install the Custom Vision service SDK for Go, run the following command in PowerShell:

go get -u github.com/Azure/azure-sdk-for-go/...

or if you use dep, within your repo run:

dep ensure -add github.com/Azure/azure-sdk-for-go

Get the training and prediction keys

The project needs a valid set of subscription keys to interact with the service. You can find the items at the Custom Vision website. Sign in with the account associated with the Azure account used to create your Custom Vision resources. On the home page (the page with the option to add a new project), select the gear icon in the upper right. Find your training and prediction resources in the list and expand them. Here you can find your training key, prediction key, and prediction resource ID values. Save these values to a temporary location.

Image of the keys UI

Or, you can obtain these keys and ID from the Azure portal by viewing your Custom Vision Training and Prediction resources and navigating to the Keys tab. There you'll find your training key and prediction key. Navigate to the Properties tab of your Prediction resource to get your prediction resource ID.

Get the sample images

This example uses the images from the Cognitive Services Python SDK Samples repository on GitHub. Clone or download this repository to your development environment. Remember its folder location for a later step.

Add the code

Create a new file called sample.go in your preferred project directory.

Create the Custom Vision service project

Add the following code to your script to create a new Custom Vision service project. Insert your subscription keys in the appropriate definitions. Also, get your Endpoint URL from the Settings page of the Custom Vision website.

See the CreateProject method to specify other options when you create your project (explained in the Build a detector web portal guide).

import(
    "context"
    "bytes"
    "fmt"
    "io/ioutil"
    "path"
    "log"
    "time"
    "github.com/Azure/azure-sdk-for-go/services/cognitiveservices/v3.0/customvision/training"
    "github.com/Azure/azure-sdk-for-go/services/cognitiveservices/v3.0/customvision/prediction"
)

var (
    training_key string = "<your training key>"
    prediction_key string = "<your prediction key>"
    prediction_resource_id = "<your prediction resource id>"
    endpoint string = "<your endpoint URL>"
    project_name string = "Go Sample OD Project"
    iteration_publish_name = "detectModel"
    sampleDataDirectory = "<path to sample images>"
)

func main() {
    fmt.Println("Creating project...")

    ctx = context.Background()

    trainer := training.New(training_key, endpoint)

    var objectDetectDomain training.Domain
    domains, _ := trainer.GetDomains(ctx)

    for _, domain := range *domains.Value {
        fmt.Println(domain, domain.Type)
        if domain.Type == "ObjectDetection" && *domain.Name == "General" {
            objectDetectDomain = domain
            break
        }
    }
    fmt.Println("Creating project...")
    project, _ := trainer.CreateProject(ctx, project_name, "", objectDetectDomain.ID, "")

Create tags in the project

To create classification tags to your project, add the following code to the end of sample.go:

# Make two tags in the new project
forkTag, _ := trainer.CreateTag(ctx, *project.ID, "fork", "A fork", string(training.Regular))
scissorsTag, _ := trainer.CreateTag(ctx, *project.ID, "scissors", "Pair of scissors", string(training.Regular))

Upload and tag images

When you tag images in object detection projects, you need to specify the region of each tagged object using normalized coordinates.

Note

If you don't have a click-and-drag utility to mark the coordinates of regions, you can use the web UI at Customvision.ai. In this example, the coordinates are already provided.

To add the images, tags, and regions to the project, insert the following code after the tag creation. Note that in this tutorial the regions are hard-coded inline. The regions specify the bounding box in normalized coordinates, and the coordinates are given in the order: left, top, width, height.

forkImageRegions := map[string][4]float64{
    "fork_1.jpg": [4]float64{ 0.145833328, 0.3509314, 0.5894608, 0.238562092 },
    "fork_2.jpg": [4]float64{ 0.294117659, 0.216944471, 0.534313738, 0.5980392 },
    "fork_3.jpg": [4]float64{ 0.09191177, 0.0682516545, 0.757352948, 0.6143791 },
    "fork_4.jpg": [4]float64{ 0.254901975, 0.185898721, 0.5232843, 0.594771266 },
    "fork_5.jpg": [4]float64{ 0.2365196, 0.128709182, 0.5845588, 0.71405226 },
    "fork_6.jpg": [4]float64{ 0.115196079, 0.133611143, 0.676470637, 0.6993464 },
    "fork_7.jpg": [4]float64{ 0.164215669, 0.31008172, 0.767156839, 0.410130739 },
    "fork_8.jpg": [4]float64{ 0.118872553, 0.318251669, 0.817401946, 0.225490168 },
    "fork_9.jpg": [4]float64{ 0.18259804, 0.2136765, 0.6335784, 0.643790841 },
    "fork_10.jpg": [4]float64{ 0.05269608, 0.282303959, 0.8088235, 0.452614367 },
    "fork_11.jpg": [4]float64{ 0.05759804, 0.0894935, 0.9007353, 0.3251634 },
    "fork_12.jpg": [4]float64{ 0.3345588, 0.07315363, 0.375, 0.9150327 },
    "fork_13.jpg": [4]float64{ 0.269607842, 0.194068655, 0.4093137, 0.6732026 },
    "fork_14.jpg": [4]float64{ 0.143382356, 0.218578458, 0.7977941, 0.295751631 },
    "fork_15.jpg": [4]float64{ 0.19240196, 0.0633497, 0.5710784, 0.8398692 },
    "fork_16.jpg": [4]float64{ 0.140931368, 0.480016381, 0.6838235, 0.240196079 },
    "fork_17.jpg": [4]float64{ 0.305147052, 0.2512582, 0.4791667, 0.5408496 },
    "fork_18.jpg": [4]float64{ 0.234068632, 0.445702642, 0.6127451, 0.344771236 },
    "fork_19.jpg": [4]float64{ 0.219362751, 0.141781077, 0.5919118, 0.6683006 },
    "fork_20.jpg": [4]float64{ 0.180147052, 0.239820287, 0.6887255, 0.235294119 },
}

scissorsImageRegions := map[string][4]float64{
    "scissors_1.jpg": [4]float64{ 0.4007353, 0.194068655, 0.259803921, 0.6617647 },
    "scissors_2.jpg": [4]float64{ 0.426470578, 0.185898721, 0.172794119, 0.5539216 },
    "scissors_3.jpg": [4]float64{ 0.289215684, 0.259428144, 0.403186262, 0.421568632 },
    "scissors_4.jpg": [4]float64{ 0.343137264, 0.105833367, 0.332107842, 0.8055556 },
    "scissors_5.jpg": [4]float64{ 0.3125, 0.09766343, 0.435049027, 0.71405226 },
    "scissors_6.jpg": [4]float64{ 0.379901975, 0.24308826, 0.32107842, 0.5718954 },
    "scissors_7.jpg": [4]float64{ 0.341911763, 0.20714055, 0.3137255, 0.6356209 },
    "scissors_8.jpg": [4]float64{ 0.231617644, 0.08459154, 0.504901946, 0.8480392 },
    "scissors_9.jpg": [4]float64{ 0.170343131, 0.332957536, 0.767156839, 0.403594762 },
    "scissors_10.jpg": [4]float64{ 0.204656869, 0.120539248, 0.5245098, 0.743464053 },
    "scissors_11.jpg": [4]float64{ 0.05514706, 0.159754932, 0.799019635, 0.730392158 },
    "scissors_12.jpg": [4]float64{ 0.265931368, 0.169558853, 0.5061275, 0.606209159 },
    "scissors_13.jpg": [4]float64{ 0.241421565, 0.184264734, 0.448529422, 0.6830065 },
    "scissors_14.jpg": [4]float64{ 0.05759804, 0.05027781, 0.75, 0.882352948 },
    "scissors_15.jpg": [4]float64{ 0.191176474, 0.169558853, 0.6936275, 0.6748366 },
    "scissors_16.jpg": [4]float64{ 0.1004902, 0.279036, 0.6911765, 0.477124184 },
    "scissors_17.jpg": [4]float64{ 0.2720588, 0.131977156, 0.4987745, 0.6911765 },
    "scissors_18.jpg": [4]float64{ 0.180147052, 0.112369314, 0.6262255, 0.6666667 },
    "scissors_19.jpg": [4]float64{ 0.333333343, 0.0274019931, 0.443627447, 0.852941155 },
    "scissors_20.jpg": [4]float64{ 0.158088237, 0.04047389, 0.6691176, 0.843137264 },
}

Then, use this map of associations to upload each sample image with its region coordinates (you can upload up to 64 images in a single batch). Add the following code.

Note

You'll need to change the path to the images based on where you downloaded the Cognitive Services Go SDK Samples project earlier.

// Go through the data table above and create the images
fmt.Println("Adding images...")
var fork_images []training.ImageFileCreateEntry
for file, region := range forkImageRegions {
    imageFile, _ := ioutil.ReadFile(path.Join(sampleDataDirectory, "fork", file))

    imageRegion := training.Region { 
        TagID:forkTag.ID,
        Left:&region[0],
        Top:&region[1],
        Width:&region[2],
        Height:&region[3],
    }

    fork_images = append(fork_images, training.ImageFileCreateEntry {
        Name: &file,
        Contents: &imageFile,
        Regions: &[]training.Region{ imageRegion },
    })
}
    
fork_batch, _ := trainer.CreateImagesFromFiles(ctx, *project.ID, training.ImageFileCreateBatch{ 
    Images: &fork_images,
})

if (!*fork_batch.IsBatchSuccessful) {
    fmt.Println("Batch upload failed.")
}

var scissor_images []training.ImageFileCreateEntry
for file, region := range scissorsImageRegions {
    imageFile, _ := ioutil.ReadFile(path.Join(sampleDataDirectory, "scissors", file))

    imageRegion := training.Region { 
        TagID:scissorsTag.ID,
        Left:&region[0],
        Top:&region[1],
        Width:&region[2],
        Height:&region[3],
    }

    scissor_images = append(scissor_images, training.ImageFileCreateEntry {
        Name: &file,
        Contents: &imageFile,
        Regions: &[]training.Region{ imageRegion },
    })
}
    
scissor_batch, _ := trainer.CreateImagesFromFiles(ctx, *project.ID, training.ImageFileCreateBatch{ 
    Images: &scissor_images,
})
    
if (!*scissor_batch.IsBatchSuccessful) {
    fmt.Println("Batch upload failed.")
}     

Train the project and publish

This code creates the first iteration of the prediction model and then publishes that iteration to the prediction endpoint. The name given to the published iteration can be used to send prediction requests. An iteration is not available in the prediction endpoint until it's published.

iteration, _ := trainer.TrainProject(ctx, *project.ID)
fmt.Println("Training status:", *iteration.Status)
for {
    if *iteration.Status != "Training" {
        break
    }
    time.Sleep(5 * time.Second)
    iteration, _ = trainer.GetIteration(ctx, *project.ID, *iteration.ID)
    fmt.Println("Training status:", *iteration.Status)
}

trainer.PublishIteration(ctx, *project.ID, *iteration.ID, iteration_publish_name, prediction_resource_id))

Get and use the published iteration on the prediction endpoint

To send an image to the prediction endpoint and retrieve the prediction, add the following code to the end of the file:

    fmt.Println("Predicting...")
    predictor := prediction.New(prediction_key, endpoint)

    testImageData, _ := ioutil.ReadFile(path.Join(sampleDataDirectory, "Test", "test_od_image.jpg"))
    results, _ := predictor.DetectImage(ctx, *project.ID, iteration_publish_name, ioutil.NopCloser(bytes.NewReader(testImageData)), "")

    for _, prediction := range *results.Predictions    {
        boundingBox := *prediction.BoundingBox

        fmt.Printf("\t%s: %.2f%% (%.2f, %.2f, %.2f, %.2f)", 
            *prediction.TagName,
            *prediction.Probability * 100,
            *boundingBox.Left,
            *boundingBox.Top,
            *boundingBox.Width,
            *boundingBox.Height)
        fmt.Println("")
    }
}

Run the application

Run sample.go.

go run sample.go

The output of the application should appear in the console. You can then verify that the test image (found in samples/vision/images/Test) is tagged appropriately and that the region of detection is correct.

Clean up resources

If you wish to implement your own object detection project (or try an image classification project instead), you may want to delete the fork/scissors detection project from this example. A free trial allows for two Custom Vision projects.

On the Custom Vision website, navigate to Projects and select the trash can under My New Project.

Screenshot of a panel labeled My New Project with a trash can icon

Next steps

Now you've seen how every step of the object detection process can be done in code. This sample executes a single training iteration, but often you'll need to train and test your model multiple times in order to make it more accurate. The following training guide deals with image classification, but its principles are similar to object detection.

This article shows you how to get started using the Custom Vision SDK with Java to build an object detection model. After it's created, you can add tagged regions, upload images, train the project, obtain the project's default prediction endpoint URL, and use the endpoint to programmatically test an image. Use this example as a template for building your own Java application.

Prerequisites

  • A Java IDE of your choice
  • JDK 7 or 8 installed.
  • Maven installed
  • To use the Custom Vision Service you will need to create Custom Vision Training and Prediction resources in Azure. To do so in the Azure portal, fill out the dialog window on the Create Custom Vision page to create both a Training and Prediction resource.

Get the Custom Vision SDK and sample code

To write a Java app that uses Custom Vision, you'll need the Custom Vision maven packages. These packages are included in the sample project you will download, but you can access them individually here.

You can find the Custom Vision SDK in the maven central repository:

Clone or download the Cognitive Services Java SDK Samples project. Navigate to the Vision/CustomVision/ folder.

This Java project creates a new Custom Vision object detection project named Sample Java OD Project, which can be accessed through the Custom Vision website. It then uploads images to train and test a classifier. In this project, the classifier is intended to determine whether an object is a fork or scissors.

Get the training and prediction keys

The project needs a valid set of subscription keys to interact with the service. You can find the items at the Custom Vision website. Sign in with the account associated with the Azure account used to create your Custom Vision resources. On the home page (the page with the option to add a new project), select the gear icon in the upper right. Find your training and prediction resources in the list and expand them. Here you can find your training key, prediction key, and prediction resource ID values. Save these values to a temporary location.

Image of the keys UI

Or, you can obtain these keys and ID from the Azure portal by viewing your Custom Vision Training and Prediction resources and navigating to the Keys tab. There you'll find your training key and prediction key. Navigate to the Properties tab of your Prediction resource to get your prediction resource ID.

The program is configured to reference your key data as environment variables. Navigate to the Vision/CustomVision folder and enter the following PowerShell commands to set the environment variables.

Note

If you're using a non-Windows operating system, see Configure environment variables for instructions.

$env:AZURE_CUSTOMVISION_TRAINING_API_KEY ="<your training api key>"
$env:AZURE_CUSTOMVISION_PREDICTION_API_KEY ="<your prediction api key>"

Understand the code

Load the Vision/CustomVision project in your Java IDE and open the CustomVisionSamples.java file. Find the runSample method and comment out the ImageClassification_Sample method call—this method executes the image classification scenario, which is not covered in this guide. The ObjectDetection_Sample method implements the primary functionality of this quickstart; navigate to its definition and inspect the code.

Create a new Custom Vision Service project

Go to the code block that creates a training client and an object detection project. The created project will show up on the Custom Vision website that you visited earlier. See the CreateProject method overloads to specify other options when you create your project (explained in the Build a detector web portal guide).

System.out.println("Creating project...");
// create an object detection project
Project project = trainer.createProject()
    .withName("Sample Java OD Project")
    .withDescription("Sample OD Project")
    .withDomainId(objectDetectionDomain.id())
    .withClassificationType(Classifier.MULTILABEL.toString())
    .execute();

Add tags to your project

// create fork tag
Tag forkTag = trainer.createTag()
    .withProjectId(project.id())
    .withName("fork")
    .execute();

// create scissors tag
Tag scissorsTag = trainer.createTag()
    .withProjectId(project.id())
    .withName("scissor")
    .execute();

Upload and tag images

When you tag images in object detection projects, you need to specify the region of each tagged object using normalized coordinates. Go to the definition of the regionMap Map. This code associates each of the sample images with its tagged region.

Note

If you don't have a click-and-drag utility to mark the coordinates of regions, you can use the web UI at Customvision.ai. In this example, the coordinates are already provided.

// Mapping of filenames to their respective regions in the image. The coordinates are specified
// as left, top, width, height in normalized coordinates. I.e. (left is left in pixels / width in pixels)

// This is a hardcoded mapping of the files we'll upload along with the bounding box of the object in the
// image. The boudning box is specified as left, top, width, height in normalized coordinates.
//  Normalized Left = Left / Width (in Pixels)
//  Normalized Top = Top / Height (in Pixels)
//  Normalized Bounding Box Width = (Right - Left) / Width (in Pixels)
//  Normalized Bounding Box Height = (Bottom - Top) / Height (in Pixels)
HashMap<String, double[]> regionMap = new HashMap<String, double[]>();
regionMap.put("scissors_1.jpg", new double[] { 0.4007353, 0.194068655, 0.259803921, 0.6617647 });
regionMap.put("scissors_2.jpg", new double[] { 0.426470578, 0.185898721, 0.172794119, 0.5539216 });
regionMap.put("scissors_3.jpg", new double[] { 0.289215684, 0.259428144, 0.403186262, 0.421568632 });
regionMap.put("scissors_4.jpg", new double[] { 0.343137264, 0.105833367, 0.332107842, 0.8055556 });
regionMap.put("scissors_5.jpg", new double[] { 0.3125, 0.09766343, 0.435049027, 0.71405226 });
regionMap.put("scissors_6.jpg", new double[] { 0.379901975, 0.24308826, 0.32107842, 0.5718954 });
regionMap.put("scissors_7.jpg", new double[] { 0.341911763, 0.20714055, 0.3137255, 0.6356209 });
regionMap.put("scissors_8.jpg", new double[] { 0.231617644, 0.08459154, 0.504901946, 0.8480392 });
regionMap.put("scissors_9.jpg", new double[] { 0.170343131, 0.332957536, 0.767156839, 0.403594762 });
regionMap.put("scissors_10.jpg", new double[] { 0.204656869, 0.120539248, 0.5245098, 0.743464053 });
regionMap.put("scissors_11.jpg", new double[] { 0.05514706, 0.159754932, 0.799019635, 0.730392158 });
regionMap.put("scissors_12.jpg", new double[] { 0.265931368, 0.169558853, 0.5061275, 0.606209159 });
regionMap.put("scissors_13.jpg", new double[] { 0.241421565, 0.184264734, 0.448529422, 0.6830065 });
regionMap.put("scissors_14.jpg", new double[] { 0.05759804, 0.05027781, 0.75, 0.882352948 });
regionMap.put("scissors_15.jpg", new double[] { 0.191176474, 0.169558853, 0.6936275, 0.6748366 });
regionMap.put("scissors_16.jpg", new double[] { 0.1004902, 0.279036, 0.6911765, 0.477124184 });
regionMap.put("scissors_17.jpg", new double[] { 0.2720588, 0.131977156, 0.4987745, 0.6911765 });
regionMap.put("scissors_18.jpg", new double[] { 0.180147052, 0.112369314, 0.6262255, 0.6666667 });
regionMap.put("scissors_19.jpg", new double[] { 0.333333343, 0.0274019931, 0.443627447, 0.852941155 });
regionMap.put("scissors_20.jpg", new double[] { 0.158088237, 0.04047389, 0.6691176, 0.843137264 });
regionMap.put("fork_1.jpg", new double[] { 0.145833328, 0.3509314, 0.5894608, 0.238562092 });
regionMap.put("fork_2.jpg", new double[] { 0.294117659, 0.216944471, 0.534313738, 0.5980392 });
regionMap.put("fork_3.jpg", new double[] { 0.09191177, 0.0682516545, 0.757352948, 0.6143791 });
regionMap.put("fork_4.jpg", new double[] { 0.254901975, 0.185898721, 0.5232843, 0.594771266 });
regionMap.put("fork_5.jpg", new double[] { 0.2365196, 0.128709182, 0.5845588, 0.71405226 });
regionMap.put("fork_6.jpg", new double[] { 0.115196079, 0.133611143, 0.676470637, 0.6993464 });
regionMap.put("fork_7.jpg", new double[] { 0.164215669, 0.31008172, 0.767156839, 0.410130739 });
regionMap.put("fork_8.jpg", new double[] { 0.118872553, 0.318251669, 0.817401946, 0.225490168 });
regionMap.put("fork_9.jpg", new double[] { 0.18259804, 0.2136765, 0.6335784, 0.643790841 });
regionMap.put("fork_10.jpg", new double[] { 0.05269608, 0.282303959, 0.8088235, 0.452614367 });
regionMap.put("fork_11.jpg", new double[] { 0.05759804, 0.0894935, 0.9007353, 0.3251634 });
regionMap.put("fork_12.jpg", new double[] { 0.3345588, 0.07315363, 0.375, 0.9150327 });
regionMap.put("fork_13.jpg", new double[] { 0.269607842, 0.194068655, 0.4093137, 0.6732026 });
regionMap.put("fork_14.jpg", new double[] { 0.143382356, 0.218578458, 0.7977941, 0.295751631 });
regionMap.put("fork_15.jpg", new double[] { 0.19240196, 0.0633497, 0.5710784, 0.8398692 });
regionMap.put("fork_16.jpg", new double[] { 0.140931368, 0.480016381, 0.6838235, 0.240196079 });
regionMap.put("fork_17.jpg", new double[] { 0.305147052, 0.2512582, 0.4791667, 0.5408496 });
regionMap.put("fork_18.jpg", new double[] { 0.234068632, 0.445702642, 0.6127451, 0.344771236 });
regionMap.put("fork_19.jpg", new double[] { 0.219362751, 0.141781077, 0.5919118, 0.6683006 });
regionMap.put("fork_20.jpg", new double[] { 0.180147052, 0.239820287, 0.6887255, 0.235294119 });

Then, skip to the code block that adds the images to the project. The images are read from the src/main/resources folder of the project and are uploaded to the service with their appropriate tags and region coordinates.

System.out.println("Adding images...");
for (int i = 1; i <= 20; i++) {
    String fileName = "fork_" + i + ".jpg";
    byte[] contents = GetImage("/fork", fileName);
    AddImageToProject(trainer, project, fileName, contents, forkTag.id(), regionMap.get(fileName));
}

for (int i = 1; i <= 20; i++) {
    String fileName = "scissors_" + i + ".jpg";
    byte[] contents = GetImage("/scissors", fileName);
    AddImageToProject(trainer, project, fileName, contents, scissorsTag.id(), regionMap.get(fileName));
}

The previous code snippet makes use of two helper functions that retrieve the images as resource streams and upload them to the service (you can upload up to 64 images in a single batch).

private static void AddImageToProject(Trainings trainer, Project project, String fileName, byte[] contents, UUID tag, double[] regionValues)
{
    System.out.println("Adding image: " + fileName);
    ImageFileCreateEntry file = new ImageFileCreateEntry()
        .withName(fileName)
        .withContents(contents);

    ImageFileCreateBatch batch = new ImageFileCreateBatch()
        .withImages(Collections.singletonList(file));

    // If Optional region is specified, tack it on and place the tag there, otherwise
    // add it to the batch.
    if (regionValues != null)
    {
        Region region = new Region()
            .withTagId(tag)
            .withLeft(regionValues[0])
            .withTop(regionValues[1])
            .withWidth(regionValues[2])
            .withHeight(regionValues[3]);
        file = file.withRegions(Collections.singletonList(region));
    } else {
        batch = batch.withTagIds(Collections.singletonList(tag));
    }

    trainer.createImagesFromFiles(project.id(), batch);
}

private static byte[] GetImage(String folder, String fileName)
{
    try {
        return ByteStreams.toByteArray(CustomVisionSamples.class.getResourceAsStream(folder + "/" + fileName));
    } catch (Exception e) {
        System.out.println(e.getMessage());
        e.printStackTrace();
    }
    return null;
}

Train the project and publish

This code creates the first iteration of the prediction model and then publishes that iteration to the prediction endpoint. The name given to the published iteration can be used to send prediction requests. An iteration is not available in the prediction endpoint until it is published.

System.out.println("Training...");
Iteration iteration = trainer.trainProject(project.id(), new TrainProjectOptionalParameter());

while (iteration.status().equals("Training"))
{
    System.out.println("Training Status: "+ iteration.status());
    Thread.sleep(5000);
    iteration = trainer.getIteration(project.id(), iteration.id());
}
System.out.println("Training Status: "+ iteration.status());

// The iteration is now trained. Publish it to the prediction endpoint.
String publishedModelName = "myModel";
String predictionResourceId = System.getenv("AZURE_CUSTOMVISION_PREDICTION_ID");
trainer.publishIteration(project.id(), iteration.id(), publishedModelName, predictionResourceId);

Use the prediction endpoint

The prediction endpoint, represented by the predictor object here, is the reference that you use to submit an image to the current model and get a classification prediction. In this sample, predictor is defined elsewhere using the prediction key environment variable.

// load test image
byte[] testImage = GetImage("/ObjectTest", "test_image.jpg");

// predict
ImagePrediction results = predictor.predictions().detectImage()
    .withProjectId(project.id())
    .withPublishedName(publishedModelName)
    .withImageData(testImage)
    .execute();

for (Prediction prediction: results.predictions())
{
    System.out.println(String.format("\t%s: %.2f%% at: %.2f, %.2f, %.2f, %.2f",
        prediction.tagName(),
        prediction.probability() * 100.0f,
        prediction.boundingBox().left(),
        prediction.boundingBox().top(),
        prediction.boundingBox().width(),
        prediction.boundingBox().height()
    ));
}

Run the application

To compile and run the solution using maven, navigate to the project directory (Vision/CustomVision) in a command prompt and execute the run command:

mvn compile exec:java

View the console output for logging and prediction results. You can then verify that the test image is tagged appropriately and that the region of detection is correct.

Clean up resources

If you wish to implement your own object detection project (or try an image classification project instead), you may want to delete the fork/scissors detection project from this example. A free trial allows for two Custom Vision projects.

On the Custom Vision website, navigate to Projects and select the trash can under My New Project.

Screenshot of a panel labeled My New Project with a trash can icon

Next steps

Now you've seen how every step of the object detection process can be done in code. This sample executes a single training iteration, but often you'll need to train and test your model multiple times in order to make it more accurate. The following training guide deals with image classification, but its principles are similar to object detection.

This article shows you how to get started using the Custom Vision SDK with Node.js to build an object detection model. After it's created, you can add tagged regions, upload images, train the project, obtain the project's published prediction endpoint URL, and use the endpoint to programmatically test an image. Use this example as a template for building your own Node.js application.

Prerequisites

  • Node.js 8 or later installed.
  • npm installed.
  • To use the Custom Vision Service you will need to create Custom Vision Training and Prediction resources in Azure. To do so in the Azure portal, fill out the dialog window on the Create Custom Vision page to create both a Training and Prediction resource.

Get the training and prediction keys

The project needs a valid set of subscription keys to interact with the service. You can find the items at the Custom Vision website. Sign in with the account associated with the Azure account used to create your Custom Vision resources. On the home page (the page with the option to add a new project), select the gear icon in the upper right. Find your training and prediction resources in the list and expand them. Here you can find your training key, prediction key, and prediction resource ID values. Save these values to a temporary location.

Image of the keys UI

Or, you can obtain these keys and ID from the Azure portal by viewing your Custom Vision Training and Prediction resources and navigating to the Keys tab. There you'll find your training key and prediction key. Navigate to the Properties tab of your Prediction resource to get your prediction resource ID.

Get the sample images

This example uses the images from the Samples/customvision/images directory of the Cognitive Services Node.js SDK Samples repo on GitHub. Clone or download this repository to your development environment.

Install the Custom Vision SDK

To install the Custom Vision service SDKs for Node.js in your project, run the following commands:

npm install @azure/cognitiveservices-customvision-training
npm install @azure/cognitiveservices-customvision-prediction

Add the code

Create a new file called sample.js in your preferred project directory.

Create the Custom Vision service project

Add the following code to your script to create a new Custom Vision service project. Insert your subscription keys in the appropriate definitions and set the sampleDataRoot path value to your image folder path. Make sure the endPoint value matches the training and prediction endpoints you have created at Customvision.ai. Note that the difference between creating an object detection and image classification project is the domain specified in the createProject call.

const fs = require('fs');
const util = require('util');
const TrainingApi = require("@azure/cognitiveservices-customvision-training");
const PredictionApi = require("@azure/cognitiveservices-customvision-prediction");
const msRest = require("@azure/ms-rest-js");

const setTimeoutPromise = util.promisify(setTimeout);

const trainingKey = "<your training key>";
const predictionKey = "<your prediction key>";
const predictionResourceId = "<your prediction resource id>";
const sampleDataRoot = "<path to image files>";

const endPoint = "https://<my-resource-name>.cognitiveservices.azure.com/"

const publishIterationName = "detectModel";

const credentials = new msRest.ApiKeyCredentials({ inHeader: { "Training-key": trainingKey } });
const trainer = new TrainingApi.TrainingAPIClient(credentials, endPoint);

/* Helper function to let us use await inside a forEach loop.
 * This lets us insert delays between image uploads to accommodate the rate limit.
 */
async function asyncForEach (array, callback) {
    for (let i = 0; i < array.length; i++) {
        await callback(array[i], i, array);
    }
}

(async () => {
    console.log("Creating project...");
    const domains = await trainer.getDomains()
    const objDetectDomain = domains.find(domain => domain.type === "ObjectDetection");
    const sampleProject = await trainer.createProject("Sample Obj Detection Project", { domainId: objDetectDomain.id });

Create tags in the project

To create classification tags to your project, add the following code to the end of sample.js:

    const forkTag = await trainer.createTag(sampleProject.id, "Fork");
    const scissorsTag = await trainer.createTag(sampleProject.id, "Scissors");

Upload and tag images

When you tag images in object detection projects, you need to specify the region of each tagged object using normalized coordinates.

Note

If you don't have a click-and-drag utility to mark the coordinates of regions, you can use the web UI at Customvision.ai. In this example, the coordinates are already provided.

To add the images, tags, and regions to the project, insert the following code after the tag creation. Note that for this tutorial the regions are hardcoded inline with the code. The regions specify the bounding box in normalized coordinates, and the coordinates are given in the order: left, top, width, height. You can upload up to 64 images in a single batch.

const forkImageRegions = {
    "fork_1.jpg": [0.145833328, 0.3509314, 0.5894608, 0.238562092],
    "fork_2.jpg": [0.294117659, 0.216944471, 0.534313738, 0.5980392],
    "fork_3.jpg": [0.09191177, 0.0682516545, 0.757352948, 0.6143791],
    "fork_4.jpg": [0.254901975, 0.185898721, 0.5232843, 0.594771266],
    "fork_5.jpg": [0.2365196, 0.128709182, 0.5845588, 0.71405226],
    "fork_6.jpg": [0.115196079, 0.133611143, 0.676470637, 0.6993464],
    "fork_7.jpg": [0.164215669, 0.31008172, 0.767156839, 0.410130739],
    "fork_8.jpg": [0.118872553, 0.318251669, 0.817401946, 0.225490168],
    "fork_9.jpg": [0.18259804, 0.2136765, 0.6335784, 0.643790841],
    "fork_10.jpg": [0.05269608, 0.282303959, 0.8088235, 0.452614367],
    "fork_11.jpg": [0.05759804, 0.0894935, 0.9007353, 0.3251634],
    "fork_12.jpg": [0.3345588, 0.07315363, 0.375, 0.9150327],
    "fork_13.jpg": [0.269607842, 0.194068655, 0.4093137, 0.6732026],
    "fork_14.jpg": [0.143382356, 0.218578458, 0.7977941, 0.295751631],
    "fork_15.jpg": [0.19240196, 0.0633497, 0.5710784, 0.8398692],
    "fork_16.jpg": [0.140931368, 0.480016381, 0.6838235, 0.240196079],
    "fork_17.jpg": [0.305147052, 0.2512582, 0.4791667, 0.5408496],
    "fork_18.jpg": [0.234068632, 0.445702642, 0.6127451, 0.344771236],
    "fork_19.jpg": [0.219362751, 0.141781077, 0.5919118, 0.6683006],
    "fork_20.jpg": [0.180147052, 0.239820287, 0.6887255, 0.235294119]
};

const scissorsImageRegions = {
    "scissors_1.jpg": [0.4007353, 0.194068655, 0.259803921, 0.6617647],
    "scissors_2.jpg": [0.426470578, 0.185898721, 0.172794119, 0.5539216],
    "scissors_3.jpg": [0.289215684, 0.259428144, 0.403186262, 0.421568632],
    "scissors_4.jpg": [0.343137264, 0.105833367, 0.332107842, 0.8055556],
    "scissors_5.jpg": [0.3125, 0.09766343, 0.435049027, 0.71405226],
    "scissors_6.jpg": [0.379901975, 0.24308826, 0.32107842, 0.5718954],
    "scissors_7.jpg": [0.341911763, 0.20714055, 0.3137255, 0.6356209],
    "scissors_8.jpg": [0.231617644, 0.08459154, 0.504901946, 0.8480392],
    "scissors_9.jpg": [0.170343131, 0.332957536, 0.767156839, 0.403594762],
    "scissors_10.jpg": [0.204656869, 0.120539248, 0.5245098, 0.743464053],
    "scissors_11.jpg": [0.05514706, 0.159754932, 0.799019635, 0.730392158],
    "scissors_12.jpg": [0.265931368, 0.169558853, 0.5061275, 0.606209159],
    "scissors_13.jpg": [0.241421565, 0.184264734, 0.448529422, 0.6830065],
    "scissors_14.jpg": [0.05759804, 0.05027781, 0.75, 0.882352948],
    "scissors_15.jpg": [0.191176474, 0.169558853, 0.6936275, 0.6748366],
    "scissors_16.jpg": [0.1004902, 0.279036, 0.6911765, 0.477124184],
    "scissors_17.jpg": [0.2720588, 0.131977156, 0.4987745, 0.6911765],
    "scissors_18.jpg": [0.180147052, 0.112369314, 0.6262255, 0.6666667],
    "scissors_19.jpg": [0.333333343, 0.0274019931, 0.443627447, 0.852941155],
    "scissors_20.jpg": [0.158088237, 0.04047389, 0.6691176, 0.843137264]
};

console.log("Adding images...");
let fileUploadPromises = [];

const forkDir = `${sampleDataRoot}/Fork`;
const forkFiles = fs.readdirSync(forkDir);

await asyncForEach(forkFiles, async (file) => {
    const region = { tagId : forkTag.id, left : forkImageRegions[file][0], top : forkImageRegions[file][1], width : forkImageRegions[file][2], height : forkImageRegions[file][3] };
    const entry = { name : file, contents : fs.readFileSync(`${forkDir}/${file}`), regions : [region] };
    const batch = { images : [entry] };
    // Wait one second to accommodate rate limit.
    await setTimeoutPromise(1000, null);
    fileUploadPromises.push(trainer.createImagesFromFiles(sampleProject.id, batch));
});

const scissorsDir = `${sampleDataRoot}/Scissors`;
const scissorsFiles = fs.readdirSync(scissorsDir);

await asyncForEach(scissorsFiles, async (file) => {
    const region = { tagId : scissorsTag.id, left : scissorsImageRegions[file][0], top : scissorsImageRegions[file][1], width : scissorsImageRegions[file][2], height : scissorsImageRegions[file][3] };
    const entry = { name : file, contents : fs.readFileSync(`${scissorsDir}/${file}`), regions : [region] };
    const batch = { images : [entry] };
    // Wait one second to accommodate rate limit.
    await setTimeoutPromise(1000, null);
    fileUploadPromises.push(trainer.createImagesFromFiles(sampleProject.id, batch));
});

await Promise.all(fileUploadPromises);

Train the project and publish

This code creates the first iteration of the prediction model and then publishes that iteration to the prediction endpoint. The name given to the published iteration can be used to send prediction requests. An iteration is not available in the prediction endpoint until it is published.

console.log("Training...");
let trainingIteration = await trainer.trainProject(sampleProject.id);

// Wait for training to complete
console.log("Training started...");
while (trainingIteration.status == "Training") {
    console.log("Training status: " + trainingIteration.status);
    // wait for one second
    await setTimeoutPromise(1000, null);
    trainingIteration = await trainer.getIteration(sampleProject.id, trainingIteration.id)
}
console.log("Training status: " + trainingIteration.status);

// Publish the iteration to the end point
await trainer.publishIteration(sampleProject.id, trainingIteration.id, publishIterationName, predictionResourceId);

Get and use the published iteration on the prediction endpoint

To send an image to the prediction endpoint and retrieve the prediction, add the following code to the end of the file:

    const predictor_credentials = new msRest.ApiKeyCredentials({ inHeader: { "Prediction-key": predictionKey } });
    const predictor = new PredictionApi.PredictionAPIClient(predictor_credentials, endPoint);

    const testFile = fs.readFileSync(`${sampleDataRoot}/Test/test_od_image.jpg`);

    const results = await predictor.detectImage(sampleProject.id, publishIterationName, testFile)

    // Show results
    console.log("Results:");
    results.predictions.forEach(predictedResult => {
        console.log(`\t ${predictedResult.tagName}: ${(predictedResult.probability * 100.0).toFixed(2)}% ${predictedResult.boundingBox.left},${predictedResult.boundingBox.top},${predictedResult.boundingBox.width},${predictedResult.boundingBox.height}`);
    });
})()

Run the application

Run sample.js.

node sample.js

The output of the application should appear in the console. You can then verify that the test image (found in samples/vision/images/Test) is tagged appropriately and that the region of detection is correct.

Clean up resources

If you wish to implement your own object detection project (or try an image classification project instead), you may want to delete the fork/scissors detection project from this example. A free trial allows for two Custom Vision projects.

On the Custom Vision website, navigate to Projects and select the trash can under My New Project.

Screenshot of a panel labeled My New Project with a trash can icon

Next steps

Now you've seen how every step of the object detection process can be done in code. This sample executes a single training iteration, but often you'll need to train and test your model multiple times in order to make it more accurate. The following training guide deals with image classification, but its principles are similar to object detection.

This article shows you how to get started using the Custom Vision SDK with Python to build an object detection model. After it's created, you can add tagged regions, upload images, train the project, obtain the project's published prediction endpoint URL, and use the endpoint to programmatically test an image. Use this example as a template for building your own Python application.

Prerequisites

  • Python 2.7+ or 3.5+
  • pip tool
  • To use the Custom Vision Service you will need to create Custom Vision Training and Prediction resources in Azure. To do so in the Azure portal, fill out the dialog window on the Create Custom Vision page to create both a Training and Prediction resource.

Install the Custom Vision SDK

To install the Custom Vision service SDK for Python, run the following command in PowerShell:

pip install azure-cognitiveservices-vision-customvision

You can download the images with the Python Samples.

Get the training and prediction keys

The project needs a valid set of subscription keys to interact with the service. You can find the items at the Custom Vision website. Sign in with the account associated with the Azure account used to create your Custom Vision resources. On the home page (the page with the option to add a new project), select the gear icon in the upper right. Find your training and prediction resources in the list and expand them. Here you can find your training key, prediction key, and prediction resource ID values. Save these values to a temporary location.

Image of the keys UI

Or, you can obtain these keys and ID from the Azure portal by viewing your Custom Vision Training and Prediction resources and navigating to the Keys tab. There you'll find your training key and prediction key. Navigate to the Properties tab of your Prediction resource to get your prediction resource ID.

Get the sample images

This example uses the images from the Cognitive Services Python SDK Samples repository on GitHub. Clone or download this repository to your development environment. Remember its folder location for a later step.

Add the code

Create a new file called sample.py in your preferred project directory.

Create the Custom Vision service project

Add the following code to your script to create a new Custom Vision service project. Insert your subscription keys in the appropriate definitions. Also, get your Endpoint URL from the Settings page of the Custom Vision website.

See the create_project method to specify other options when you create your project (explained in the Build a detector web portal guide).

from azure.cognitiveservices.vision.customvision.training import CustomVisionTrainingClient
from azure.cognitiveservices.vision.customvision.training.models import ImageFileCreateEntry, Region
from msrest.authentication import ApiKeyCredentials

ENDPOINT = "<your API endpoint>"

# Replace with a valid key
training_key = "<your training key>"
prediction_key = "<your prediction key>"
prediction_resource_id = "<your prediction resource id>"

publish_iteration_name = "detectModel"

credentials = ApiKeyCredentials(in_headers={"Training-key": training_key})
trainer = CustomVisionTrainingClient(ENDPOINT, credentials)

# Find the object detection domain
obj_detection_domain = next(domain for domain in trainer.get_domains() if domain.type == "ObjectDetection" and domain.name == "General")

# Create a new project
print ("Creating project...")
project = trainer.create_project("My Detection Project", domain_id=obj_detection_domain.id)

Create tags in the project

To create object tags in your project, add the following code to the end of sample.py:

# Make two tags in the new project
fork_tag = trainer.create_tag(project.id, "fork")
scissors_tag = trainer.create_tag(project.id, "scissors")

Upload and tag images

When you tag images in object detection projects, you need to specify the region of each tagged object using normalized coordinates.

Note

If you don't have a click-and-drag utility to mark the coordinates of regions, you can use the web UI at Customvision.ai. In this example, the coordinates are already provided.

To add the images, tags, and regions to the project, insert the following code after the tag creation. For this tutorial, the regions are hardcoded inline with the code. The regions specify the bounding box in normalized coordinates, and the coordinates are given in the order: left, top, width, height.

fork_image_regions = {
    "fork_1": [ 0.145833328, 0.3509314, 0.5894608, 0.238562092 ],
    "fork_2": [ 0.294117659, 0.216944471, 0.534313738, 0.5980392 ],
    "fork_3": [ 0.09191177, 0.0682516545, 0.757352948, 0.6143791 ],
    "fork_4": [ 0.254901975, 0.185898721, 0.5232843, 0.594771266 ],
    "fork_5": [ 0.2365196, 0.128709182, 0.5845588, 0.71405226 ],
    "fork_6": [ 0.115196079, 0.133611143, 0.676470637, 0.6993464 ],
    "fork_7": [ 0.164215669, 0.31008172, 0.767156839, 0.410130739 ],
    "fork_8": [ 0.118872553, 0.318251669, 0.817401946, 0.225490168 ],
    "fork_9": [ 0.18259804, 0.2136765, 0.6335784, 0.643790841 ],
    "fork_10": [ 0.05269608, 0.282303959, 0.8088235, 0.452614367 ],
    "fork_11": [ 0.05759804, 0.0894935, 0.9007353, 0.3251634 ],
    "fork_12": [ 0.3345588, 0.07315363, 0.375, 0.9150327 ],
    "fork_13": [ 0.269607842, 0.194068655, 0.4093137, 0.6732026 ],
    "fork_14": [ 0.143382356, 0.218578458, 0.7977941, 0.295751631 ],
    "fork_15": [ 0.19240196, 0.0633497, 0.5710784, 0.8398692 ],
    "fork_16": [ 0.140931368, 0.480016381, 0.6838235, 0.240196079 ],
    "fork_17": [ 0.305147052, 0.2512582, 0.4791667, 0.5408496 ],
    "fork_18": [ 0.234068632, 0.445702642, 0.6127451, 0.344771236 ],
    "fork_19": [ 0.219362751, 0.141781077, 0.5919118, 0.6683006 ],
    "fork_20": [ 0.180147052, 0.239820287, 0.6887255, 0.235294119 ]
}

scissors_image_regions = {
    "scissors_1": [ 0.4007353, 0.194068655, 0.259803921, 0.6617647 ],
    "scissors_2": [ 0.426470578, 0.185898721, 0.172794119, 0.5539216 ],
    "scissors_3": [ 0.289215684, 0.259428144, 0.403186262, 0.421568632 ],
    "scissors_4": [ 0.343137264, 0.105833367, 0.332107842, 0.8055556 ],
    "scissors_5": [ 0.3125, 0.09766343, 0.435049027, 0.71405226 ],
    "scissors_6": [ 0.379901975, 0.24308826, 0.32107842, 0.5718954 ],
    "scissors_7": [ 0.341911763, 0.20714055, 0.3137255, 0.6356209 ],
    "scissors_8": [ 0.231617644, 0.08459154, 0.504901946, 0.8480392 ],
    "scissors_9": [ 0.170343131, 0.332957536, 0.767156839, 0.403594762 ],
    "scissors_10": [ 0.204656869, 0.120539248, 0.5245098, 0.743464053 ],
    "scissors_11": [ 0.05514706, 0.159754932, 0.799019635, 0.730392158 ],
    "scissors_12": [ 0.265931368, 0.169558853, 0.5061275, 0.606209159 ],
    "scissors_13": [ 0.241421565, 0.184264734, 0.448529422, 0.6830065 ],
    "scissors_14": [ 0.05759804, 0.05027781, 0.75, 0.882352948 ],
    "scissors_15": [ 0.191176474, 0.169558853, 0.6936275, 0.6748366 ],
    "scissors_16": [ 0.1004902, 0.279036, 0.6911765, 0.477124184 ],
    "scissors_17": [ 0.2720588, 0.131977156, 0.4987745, 0.6911765 ],
    "scissors_18": [ 0.180147052, 0.112369314, 0.6262255, 0.6666667 ],
    "scissors_19": [ 0.333333343, 0.0274019931, 0.443627447, 0.852941155 ],
    "scissors_20": [ 0.158088237, 0.04047389, 0.6691176, 0.843137264 ]
}

Then, use this map of associations to upload each sample image with its region coordinates (you can upload up to 64 images in a single batch). Add the following code.

Note

You'll need to change the path to the images based on where you downloaded the Cognitive Services Python SDK Samples repo earlier.

# Update this with the path to where you downloaded the images.
base_image_url = "<path to repo directory>/cognitive-services-python-sdk-samples/samples/vision/"

# Go through the data table above and create the images
print ("Adding images...")
tagged_images_with_regions = []

for file_name in fork_image_regions.keys():
    x,y,w,h = fork_image_regions[file_name]
    regions = [ Region(tag_id=fork_tag.id, left=x,top=y,width=w,height=h) ]

    with open(base_image_url + "images/fork/" + file_name + ".jpg", mode="rb") as image_contents:
        tagged_images_with_regions.append(ImageFileCreateEntry(name=file_name, contents=image_contents.read(), regions=regions))

for file_name in scissors_image_regions.keys():
    x,y,w,h = scissors_image_regions[file_name]
    regions = [ Region(tag_id=scissors_tag.id, left=x,top=y,width=w,height=h) ]

    with open(base_image_url + "images/scissors/" + file_name + ".jpg", mode="rb") as image_contents:
        tagged_images_with_regions.append(ImageFileCreateEntry(name=file_name, contents=image_contents.read(), regions=regions))

upload_result = trainer.create_images_from_files(project.id, images=tagged_images_with_regions)
if not upload_result.is_batch_successful:
    print("Image batch upload failed.")
    for image in upload_result.images:
        print("Image status: ", image.status)
    exit(-1)

Train the project and publish

This code creates the first iteration of the prediction model and then publishes that iteration to the prediction endpoint. The name given to the published iteration can be used to send prediction requests. An iteration is not available in the prediction endpoint until it is published.

import time

print ("Training...")
iteration = trainer.train_project(project.id)
while (iteration.status != "Completed"):
    iteration = trainer.get_iteration(project.id, iteration.id)
    print ("Training status: " + iteration.status)
    time.sleep(1)

# The iteration is now trained. Publish it to the project endpoint
trainer.publish_iteration(project.id, iteration.id, publish_iteration_name, prediction_resource_id)
print ("Done!")

Get and use the published iteration on the prediction endpoint

To send an image to the prediction endpoint and retrieve the prediction, add the following code to the end of the file:

from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient
from msrest.authentication import ApiKeyCredentials

# Now there is a trained endpoint that can be used to make a prediction
prediction_credentials = ApiKeyCredentials(in_headers={"Prediction-key": prediction_key})
predictor = CustomVisionPredictionClient(ENDPOINT, prediction_credentials)

# Open the sample image and get back the prediction results.
with open(base_image_url + "images/Test/test_od_image.jpg", mode="rb") as test_data:
    results = predictor.detect_image(project.id, publish_iteration_name, test_data)

# Display the results.    
for prediction in results.predictions:
    print("\t" + prediction.tag_name + ": {0:.2f}% bbox.left = {1:.2f}, bbox.top = {2:.2f}, bbox.width = {3:.2f}, bbox.height = {4:.2f}".format(prediction.probability * 100, prediction.bounding_box.left, prediction.bounding_box.top, prediction.bounding_box.width, prediction.bounding_box.height))

Run the application

Run sample.py.

python sample.py

The output of the application should appear in the console. You can then verify that the test image (found in samples/vision/images/Test) is tagged appropriately and that the region of detection is correct.

Clean up resources

If you wish to implement your own object detection project (or try an image classification project instead), you may want to delete the fork/scissors detection project from this example. A free trial allows for two Custom Vision projects.

On the Custom Vision website, navigate to Projects and select the trash can under My New Project.

Screenshot of a panel labeled My New Project with a trash can icon

Next steps

Now you've seen how every step of the object detection process can be done in code. This sample executes a single training iteration, but often you'll need to train and test your model multiple times in order to make it more accurate. The following training guide deals with image classification, but its principles are similar to object detection.