Overview of building a classifier with Custom Vision

To use Custom Vision Service, you must first build a classifier. This will guide you through building a classifier using our website customvision.ai. All of these operations can also be performed using the Custom Vision Service APIs, as found in the "Reference" section of this documentation.


To build a classifier, you must first have:

  • A valid Microsoft account or an Azure Active Directory OrgID ("work or school account"), so you can sign into customvision.ai and get started.


    The OrgID login for Azure Active Directory (Azure AD) users from national clouds is not currently supported.

  • A series of images to train your classifier (with a minimum of 30 images per tag).

  • A few images to test your classifier after the classifier is trained.

  • Reccomended: an Azure subscription associated with your Microsoft Account or OrgID. Without an Azure subscription, you will only be able to create "limited trial" projects. If you have an Azure subscription, you will be prompted to create Custom Vision Service Training and Prediction resources in the Microsoft Azure Portal during the project create flow.

Getting started: Build a classifier

Custom Vision Service can be found on the Custom Vision page.

After you sign in to Custom Vision Service, you are presented with a list of projects. Outside of two "limited trial" projects for testing, projects are associated with an Azure Resource. If you are an Azure user, you will see all the projects associated with Azure Resources to which you have access.

  1. To create your first project, select New Project.

  2. If this is your first project, you are asked to agree to the Terms of Service. Select the check box, and then select the I agree button. The New project dialog box appears.

    The new project dialog box has fields for name, description, and domains, which consist of general, food, landmarks, retail, and adult.

  3. Enter a name and a description for this project. Then select one domain. There are several domains available. Each one optimizes the classifier for specific types of images, as described in the following table:

    Domain Purpose
    Generic Optimized for a broad range of image classification tasks. If none of the other domains are appropriate, or you are unsure of which domain to choose, select the Generic domain.
    Food Optimized for photographs of dishes as you would see them on a restaurant menu. If you want to classify photographs of individual fruits or vegetables, use the Food domain.
    Landmarks Optimized for recognizable landmarks, both natural and artificial. This domain works best when the landmark is clearly visible in the photograph, even if the landmark is slightly obstructed by people in front of it.
    Retail Optimized for images that are found in a shopping catalog or shopping website. If you want high precision classifying between dresses, pants, and shirts, use this domain.
    Adult Optimized to better define adult content and non-adult content. For example, if you want to block images of people in bathing suits, this domain allows you to build a custom classifier to do that.
    Compact domains Optimized for the constraints of real-time classification on mobile devices. The models generated by compact domains can be exported to run locally.

    You can change the domain later if you want. You will also need to select a Resource Group. The Resource Group dropdown will show you all of your Azure Resource Groups that include a Custom Vision Service Resource. Additionally, you can create two "limited trial" projects for experimenting with the service that are tied to your account, which is the only resource group a non-Azure user will be able to choose from.

  4. Add images to train your classifier.

    Add some images to train your classifier. Let's say you want a classifier to distinguish between dogs and ponies. You would upload and tag at least 30 images of dogs and 30 images of ponies.

    Try to upload images with different camera angles, lighting, background, types, styles, groups, sizes, and so on. Use a variety of photo types to ensure that your classifier is not biased and can generalize well.


    Custom Vision Service accepts training images in .jpg, .png, and .bmp format, up to 6 MB per image. (Prediction images can be up to 4 MB per image.) We recommend that images be 256 pixels on the shortest edge. Any images shorter than 256 pixels on the shortest edge are scaled up by Custom Vision Service.

    a. Select Add images.

    The add images control is shown in the upper left, and as a button at bottom center.

    b. Browse to the location of your training images.


    You can use the REST API to load training images from URLs. The web app can only upload training images from your local computer.

    The browse local files button is shown near bottom center.

    c. Select the images for your first tag.

    d. Select Open to open the selected images.

    e. Assign tags: Type in the tag you want to assign, and then press the + button to assign the tag. You can add more than one tag at a time to the images.

    The "add some tags" text control is below the images of dogs. The plus sign is to the right of the text control. The "upload files" button is on the lower right.

    f. When you are done adding tags, select Upload [number] files. If you have a large number of images or a slow Internet connection, the upload might take some time.

    g. After the files have uploaded, select Done.

    The progress bar shows all tasks completed. The upload report shows 38 images uploaded successfully. The Done button is on the lower right.

    h. To load more images with a different set of tags, return to step a.

  5. Train your classifier

    After your images are uploaded, you are ready to train your classifier. All you have to do is select the Train button.

    The train button is near the right top of the browser window.

    It should only take a few minutes to train your classifier.

    The train button is near the right top of the browser window.

  6. Evaluate your classifier

    The precision and recall indicators tell you how good your classifier is, based on automatic testing. Custom Vision Service uses the images that you submitted for training to calculate these numbers, by using a process called k-fold cross validation.

    The training results, which shows the overall precision and recall, and the precision and recall for each tag in the classifier.


    Each time you select the Train button, you create a new iteration of your classifier. You can view all your old iterations in the Performance tab, and you can delete any that might be obsolete. When you delete an iteration, you end up deleting any images that are uniquely associated with it.

    The classifier uses all the images to create a model that identifies each tag. To test the quality of the model, the classifier then tries each image on its model to see what the model finds.

    The qualities of the classifier results are displayed.

    Term Definition
    Precision When you classify an image, how likely is your classifier to correctly classify the image? Out of all images that are used to train the classifier (dogs and ponies), what percent did the model get correct? Ninety-nine correct tags out of 100 images gives a precision of 99%.
    Recall Out of all images that should have been classified correctly, how many did your classifier identify correctly? A recall of 100% means that if there are 38 dog images in the images that were used to train the classifier, the classifier found 38 dogs.

Next steps

Custom Vision API C# tutorial