What is Optical character recognition?
Optical character recognition (OCR) allows you to extract printed or handwritten text from images, such as photos of street signs and products, as well as from documents—invoices, bills, financial reports, articles, and more. Microsoft's OCR technologies support extracting printed text in several languages. Follow a quickstart to get started.
This documentation contains the following types of articles:
- The quickstarts are step-by-step instructions that let you make calls to the service and get results in a short period of time.
- The how-to guides contain instructions for using the service in more specific or customized ways.
The Computer Vision Read API is Azure's latest OCR technology (learn what's new) that extracts printed text (in several languages), handwritten text (in several languages), digits, and currency symbols from images and multi-page PDF documents. It's optimized to extract text from text-heavy images and multi-page PDF documents with mixed languages. It supports detecting both printed and handwritten text in the same image or document.
The Read call takes images and documents as its input. They have the following requirements:
- Supported file formats: JPEG, PNG, BMP, PDF, and TIFF
- For PDF and TIFF files, up to 2000 pages (only first two pages for the free tier) are processed.
- The file size must be less than 50 MB (6 MB for the free tier) and dimensions at least 50 x 50 pixels and at most 10000 x 10000 pixels.
The Read API supports 122 languages for print text and 7 languages for handwritten text, including preview languages and features.
OCR for print text includes support for English, French, German, Italian, Portuguese, Spanish, Chinese, Japanese, Korean, and Russian (preview), along with Latin and Cyrillic languages with the latest preview update.
OCR for handwritten text includes support for English, and preview of French, German, Italian, Portuguese, Spanish, and Chinese language support.
See How to specify the model version to use the preview languages and features. Refer to the full list of OCR-supported languages. The preview model includes any enhancements to the currently GA version.
The Read API includes the following features.
- Print text extraction in 122 languages
- Handwritten text extraction in seven languages
- Text lines and words with location and confidence scores
- No language identification required
- Support for mixed languages, mixed mode (print and handwritten)
- Select pages and page ranges from large, multi-page documents
- Natural reading order option for text line output (Latin only)
- Handwriting classification for text lines (Latin only)
- Available as Distroless Docker container for on-premise deployment
Learn how to use the OCR features.
Use the cloud API or deploy on-premise
The Read 3.x cloud APIs are the preferred option for most customers because of ease of integration and fast productivity out of the box. Azure and the Computer Vision service handle scale, performance, data security, and compliance needs while you focus on meeting your customers' needs.
For on-premise deployment, the Read Docker container (preview) enables you to deploy the new OCR capabilities in your own local environment. Containers are great for specific security and data governance requirements.
The Computer Vision 2.0 RecognizeText operations are in the process of being deprecated in favor of the new Read API covered in this article. Existing customers should transition to using Read operations.
Data privacy and security
As with all of the Cognitive Services, developers using the Computer Vision service should be aware of Microsoft's policies on customer data. See the Cognitive Services page on the Microsoft Trust Center to learn more.