What is Optical character recognition?

Optical character recognition (OCR) allows you to extract printed or handwritten text from images, such as photos of street signs and products, as well as from documents—invoices, bills, financial reports, articles, and more. Microsoft's OCR technologies support extracting printed text in several languages. Follow a quickstart to get started.

OCR demos

This documentation contains the following types of articles:

  • The quickstarts are step-by-step instructions that let you make calls to the service and get results in a short period of time.
  • The how-to guides contain instructions for using the service in more specific or customized ways.

Read API

The Computer Vision Read API is Azure's latest OCR technology (learn what's new) that extracts printed text (in several languages), handwritten text (English only), digits, and currency symbols from images and multi-page PDF documents. It's optimized to extract text from text-heavy images and multi-page PDF documents with mixed languages. It supports detecting both printed and handwritten text in the same image or document.

How OCR converts images and documents into structured output with extracted text

Input requirements

The Read call takes images and documents as its input. They have the following requirements:

  • Supported file formats: JPEG, PNG, BMP, PDF, and TIFF
  • For PDF and TIFF files, up to 2000 pages (only first two pages for the free tier) are processed.
  • The file size must be less than 50 MB (4 MB for the free tier) and dimensions at least 50 x 50 pixels and at most 10000 x 10000 pixels.

Supported languages

The Read API supports a total of 73 languages for print style text. Refer to the full list of OCR-supported languages. Handwritten-style OCR is supported exclusively for English.

Key features

The Read API includes the following features.

  • Print text extraction in 73 languages
  • Handwritten text extraction in English
  • Text lines and words with location and confidence scores
  • No language identification required
  • Support for mixed languages, mixed mode (print and handwritten)
  • Select pages and page ranges from large, multi-page documents
  • Natural reading order for text lines
  • Handwriting classification for text lines
  • Available as Distroless Docker container for on-premise deployment

Learn how to use the OCR features.

Use the cloud API or deploy on-premise

The Read 3.x cloud APIs are the preferred option for most customers because of ease of integration and fast productivity out of the box. Azure and the Computer Vision service handle scale, performance, data security, and compliance needs while you focus on meeting your customers' needs.

For on-premise deployment, the Read Docker container (preview) enables you to deploy the new OCR capabilities in your own local environment. Containers are great for specific security and data governance requirements.

OCR API

The legacy OCR API uses an older recognition model, supports only images, and executes synchronously, returning immediately with the detected text. See the OCR column of supported languages for a list of supported languages.

Warning

The Computer Vision 2.0 RecognizeText operations are in the process of being deprecated in favor of the new Read API covered in this article. Existing customers should transition to using Read operations.

Data privacy and security

As with all of the Cognitive Services, developers using the Computer Vision service should be aware of Microsoft's policies on customer data. See the Cognitive Services page on the Microsoft Trust Center to learn more.

Next steps