Install and run Computer Vision containers

Containers enable you to run the Computer Vision APIs in your own environment. Containers are great for specific security and data governance requirements. In this article you'll learn how to download, install, and run a Computer Vision container.

There are two Docker containers available for Computer Vision: Recognize Text and Read. The Recognize Text container allows you to detect and extract printed text from images of various objects with different surfaces and backgrounds, such as receipts, posters, and business cards. The Read container, however; also detects handwritten text in images and provides PDF/TIFF/multi-page support. For more information, see the Read API documentation.

Important

The Recognize Text container is being deprecated in favor of the Read container. The Read container is a superset of its predecessor the Recognize Text container, and consumers should migrate to using the Read container. Both containers work only with English.

If you don't have an Azure subscription, create a free account before you begin.

Prerequisites

You must meet the following prerequisites before using the containers:

Required Purpose
Docker Engine You need the Docker Engine installed on a host computer. Docker provides packages that configure the Docker environment on macOS, Windows, and Linux. For a primer on Docker and container basics, see the Docker overview.

Docker must be configured to allow the containers to connect with and send billing data to Azure.

On Windows, Docker must also be configured to support Linux containers.

Familiarity with Docker You should have a basic understanding of Docker concepts, like registries, repositories, containers, and container images, as well as knowledge of basic docker commands.
Computer Vision resource In order to use the container, you must have:

An Azure Computer Vision resource and the associated API key the endpoint URI. Both values are available on the Overview and Keys pages for the resource and are required to start the container.

{API_KEY}: One of the two available resource keys on the Keys page

{ENDPOINT_URI}: The endpoint as provided on the Overview page

Gathering required parameters

There are three primary parameters for all Cognitive Services' containers that are required. The end-user license agreement (EULA) must be present with a value of accept. Additionally, both an Endpoint URL and API Key are needed.

Note

The only exception to these three required parameters is when containers are considered "Offline" containers. Offline containers do not report usage, are not metered and follow a different billing methodology.

Endpoint URI {ENDPOINT_URI}

The Endpoint URI value is available on the Azure portal Overview page of the corresponding Cognitive Service resource. Navigate to the Overview page, hover over the Endpoint, and a Copy to clipboard icon will appear. Copy and use where needed.

Gather the endpoint uri for later use

Keys {API_KEY}

This key is used to start the container, and is available on the Azure portal's Keys page of the corresponding Cognitive Service resource. Navigate to the Keys page, and click on the Copy to clipboard icon.

Get one of the two keys for later use

Important

These subscription keys are used to access your Cognitive Service API. Do not share your keys. Store them securely, for example, using Azure Key Vault. We also recommend regenerating these keys regularly. Only one key is necessary to make an API call. When regenerating the first key, you can use the second key for continued access to the service.

Request access to the private container registry

Fill out and submit the Cognitive Services Vision Containers Request form to request access to the container. The form requests information about you, your company, and the user scenario for which you'll use the container. After you submit the form, the Azure Cognitive Services team reviews it to make sure that you meet the criteria for access to the private container registry.

Important

You must use an email address associated with either a Microsoft Account (MSA) or an Azure Active Directory (Azure AD) account in the form.

If your request is approved, you receive an email with instructions that describe how to obtain your credentials and access the private container registry.

Log in to the private container registry

There are several ways to authenticate with the private container registry for Cognitive Services containers. We recommend that you use the command-line method by using the Docker CLI.

Use the docker login command, as shown in the following example, to log in to containerpreview.azurecr.io, which is the private container registry for Cognitive Services containers. Replace <username> with the user name and <password> with the password provided in the credentials you received from the Azure Cognitive Services team.

docker login containerpreview.azurecr.io -u <username> -p <password>

If you secured your credentials in a text file, you can concatenate the contents of that text file to the docker login command. Use the cat command, as shown in the following example. Replace <passwordFile> with the path and name of the text file that contains the password. Replace <username> with the user name provided in your credentials.

cat <passwordFile> | docker login containerpreview.azurecr.io -u <username> --password-stdin

The host computer

The host is a x64-based computer that runs the Docker container. It can be a computer on your premises or a Docker hosting service in Azure, such as:

Container requirements and recommendations

Note

The requirements and recommendations are based on benchmarks with a single request per second, using an 8-MB image of a scanned business letter that contains 29 lines and a total of 803 characters.

The following table describes the minimum and recommended allocation of resources for each Read container.

Container Minimum Recommended TPS
(Minimum, Maximum)
Read 1 cores, 8-GB memory, 0.24 TPS 8 cores, 16-GB memory, 1.17 TPS 0.24, 1.17
  • Each core must be at least 2.6 gigahertz (GHz) or faster.
  • TPS - transactions per second.

Core and memory correspond to the --cpus and --memory settings, which are used as part of the docker run command.

Get the container image with docker pull

Container images for Read are available.

Container Container Registry / Repository / Image Name
Read containerpreview.azurecr.io/microsoft/cognitive-services-read:latest

Use the docker pull command to download a container image.

Docker pull for the Read container

docker pull containerpreview.azurecr.io/microsoft/cognitive-services-read:latest

Tip

You can use the docker images command to list your downloaded container images. For example, the following command lists the ID, repository, and tag of each downloaded container image, formatted as a table:

docker images --format "table {{.ID}}\t{{.Repository}}\t{{.Tag}}"

IMAGE ID         REPOSITORY                TAG
<image-id>       <repository-path/name>    <tag-name>

How to use the container

Once the container is on the host computer, use the following process to work with the container.

  1. Run the container, with the required billing settings. More examples of the docker run command are available.
  2. Query the container's prediction endpoint.

Run the container with docker run

Use the docker run command to run the container. Refer to gathering required parameters for details on how to get the {ENDPOINT_URI} and {API_KEY} values.

Examples of the docker run command are available.

docker run --rm -it -p 5000:5000 --memory 16g --cpus 8 \
containerpreview.azurecr.io/microsoft/cognitive-services-read \
Eula=accept \
Billing={ENDPOINT_URI} \
ApiKey={API_KEY}

This command:

  • Runs the Read container from the container image.
  • Allocates 8 CPU core and 16 gigabytes (GB) of memory.
  • Exposes TCP port 5000 and allocates a pseudo-TTY for the container.
  • Automatically removes the container after it exits. The container image is still available on the host computer.

More examples of the docker run command are available.

Important

The Eula, Billing, and ApiKey options must be specified to run the container; otherwise, the container won't start. For more information, see Billing.

Run multiple containers on the same host

If you intend to run multiple containers with exposed ports, make sure to run each container with a different exposed port. For example, run the first container on port 5000 and the second container on port 5001.

You can have this container and a different Azure Cognitive Services container running on the HOST together. You also can have multiple containers of the same Cognitive Services container running.

Validate that a container is running

There are several ways to validate that the container is running. Locate the External IP address and exposed port of the container in question, and open your favorite web browser. Use the various request URLs below to validate the container is running. The example request URLs listed below are http://localhost:5000, but your specific container may vary. Keep in mind that you're to rely on your container's External IP address and exposed port.

Request URL Purpose
http://localhost:5000/ The container provides a home page.
http://localhost:5000/status Requested with an HTTP GET, to validate that the container is running without causing an endpoint query. This request can be used for Kubernetes liveness and readiness probes.
http://localhost:5000/swagger The container provides a full set of documentation for the endpoints and a Try it out feature. With this feature, you can enter your settings into a web-based HTML form and make the query without having to write any code. After the query returns, an example CURL command is provided to demonstrate the HTTP headers and body format that's required.

Container's home page

Query the container's prediction endpoint

The container provides REST-based query prediction endpoint APIs.

Use the host, http://localhost:5000, for container APIs.

Asynchronous read

You can use the POST /vision/v2.0/read/core/asyncBatchAnalyze and GET /vision/v2.0/read/operations/{operationId} operations in concert to asynchronously read an image, similar to how the Computer Vision service uses those corresponding REST operations. The asynchronous POST method will return an operationId that is used as the identifer to the HTTP GET request.

From the swagger UI, select the asyncBatchAnalyze to expand it in the browser. Then select Try it out > Choose file. In this example, we'll use the following image:

tabs vs spaces

When the asynchronous POST has run successfully, it returns an HTTP 202 status code. As part of the response, there is an operation-location header that holds the result endpoint for the request.

 content-length: 0
 date: Fri, 13 Sep 2019 16:23:01 GMT
 operation-location: http://localhost:5000/vision/v2.0/read/operations/a527d445-8a74-4482-8cb3-c98a65ec7ef9
 server: Kestrel

The operation-location is the fully qualified URL and is accessed via an HTTP GET. Here is the JSON response from executing the operation-location URL from the preceding image:

{
  "status": "Succeeded",
  "recognitionResults": [
    {
      "page": 1,
      "clockwiseOrientation": 2.42,
      "width": 502,
      "height": 252,
      "unit": "pixel",
      "lines": [
        {
          "boundingBox": [
            56,
            39,
            317,
            50,
            313,
            134,
            53,
            123
          ],
          "text": "Tabs VS",
          "words": [
            {
              "boundingBox": [
                90,
                43,
                243,
                53,
                243,
                123,
                94,
                125
              ],
              "text": "Tabs",
              "confidence": "Low"
            },
            {
              "boundingBox": [
                259,
                55,
                313,
                62,
                313,
                122,
                259,
                123
              ],
              "text": "VS"
            }
          ]
        },
        {
          "boundingBox": [
            221,
            148,
            417,
            146,
            417,
            206,
            227,
            218
          ],
          "text": "Spaces",
          "words": [
            {
              "boundingBox": [
                230,
                148,
                416,
                141,
                419,
                211,
                232,
                218
              ],
              "text": "Spaces"
            }
          ]
        }
      ]
    }
  ]
}

Synchronous read

You can use the POST /vision/v2.0/read/core/Analyze operation to synchronously read an image. When the image is read in its entirety, then and only then does the API return a JSON response. The only exception to this is if an error occurs. When an error occurs the following JSON is returned:

{
    status: "Failed"
}

The JSON response object has the same object graph as the asynchronous version. If you're a JavaScript user and want type safety, the following types could be used to cast the JSON response as an AnalyzeResult object.

export interface AnalyzeResult {
    status: Status;
    recognitionResults?: RecognitionResult[] | null;
}

export enum Status {
    NotStarted = 0,
    Running = 1,
    Failed = 2,
    Succeeded = 3
}

export enum Unit {
    Pixel = 0,
    Inch = 1
}

export interface RecognitionResult {
    page?: number | null;
    clockwiseOrientation?: number | null;
    width?: number | null;
    height?: number | null;
    unit?: Unit | null;
    lines?: Line[] | null;
}

export interface Line {
    boundingBox?: number[] | null;
    text: string;
    words?: Word[] | null;
}

export interface Word {
  boundingBox?: number[] | null;
  text: string;
  confidence?: string | null;
}

For an example use-case, see the TypeScript sandbox here and select "Run" to visualize its ease-of-use.

Stop the container

To shut down the container, in the command-line environment where the container is running, select Ctrl+C.

Troubleshooting

If you run the container with an output mount and logging enabled, the container generates log files that are helpful to troubleshoot issues that happen while starting or running the container.

Tip

For more troubleshooting information and guidance, see Cognitive Services containers frequently asked questions (FAQ).

Billing

The Cognitive Services containers send billing information to Azure, using the corresponding resource on your Azure account.

Queries to the container are billed at the pricing tier of the Azure resource that's used for the <ApiKey>.

Azure Cognitive Services containers aren't licensed to run without being connected to the billing endpoint for metering. You must enable the containers to communicate billing information with the billing endpoint at all times. Cognitive Services containers don't send customer data, such as the image or text that's being analyzed, to Microsoft.

Connect to Azure

The container needs the billing argument values to run. These values allow the container to connect to the billing endpoint. The container reports usage about every 10 to 15 minutes. If the container doesn't connect to Azure within the allowed time window, the container continues to run but doesn't serve queries until the billing endpoint is restored. The connection is attempted 10 times at the same time interval of 10 to 15 minutes. If it can't connect to the billing endpoint within the 10 tries, the container stops running.

Billing arguments

For the docker run command to start the container, all three of the following options must be specified with valid values:

Option Description
ApiKey The API key of the Cognitive Services resource that's used to track billing information.
The value of this option must be set to an API key for the provisioned resource that's specified in Billing.
Billing The endpoint of the Cognitive Services resource that's used to track billing information.
The value of this option must be set to the endpoint URI of a provisioned Azure resource.
Eula Indicates that you accepted the license for the container.
The value of this option must be set to accept.

For more information about these options, see Configure containers.

Blog posts

Developer samples

Developer samples are available at our GitHub repository.

View webinar

Join the webinar to learn about:

  • How to deploy Cognitive Services to any machine using Docker
  • How to deploy Cognitive Services to AKS

Summary

In this article, you learned concepts and workflow for downloading, installing, and running Computer Vision containers. In summary:

  • Computer Vision provides a Linux container for Docker, encapsulating both Recognize Text and Read.
  • Container images are downloaded from the "Container Preview" container registry in Azure.
  • Container images run in Docker.
  • You can use either the REST API or SDK to call operations in Recognize Text or Read containers by specifying the host URI of the container.
  • You must specify billing information when instantiating a container.

Important

Cognitive Services containers are not licensed to run without being connected to Azure for metering. Customers need to enable the containers to communicate billing information with the metering service at all times. Cognitive Services containers do not send customer data (for example, the image or text that is being analyzed) to Microsoft.

Next steps