What is Optical character recognition?
Optical character recognition (OCR) allows you to extract printed or handwritten text from images, such as photos of street signs and products, as well as from documents—invoices, bills, financial reports, articles, and more. Microsoft's OCR technologies support extracting printed text in several languages. Follow a quickstart to get started.
This documentation contains the following types of articles:
- The quickstarts are step-by-step instructions that let you make calls to the service and get results in a short period of time.
- The how-to guides contain instructions for using the service in more specific or customized ways.
The Computer Vision Read API is Azure's latest OCR technology (learn what's new) that extracts printed text (in several languages), handwritten text (in several languages), digits, and currency symbols from images and multi-page PDF documents. It's optimized to extract text from text-heavy images and multi-page PDF documents with mixed languages. It supports extracting both printed and handwritten text in the same image or document.
The Read call takes images and documents as its input. They have the following requirements:
- Supported file formats: JPEG, PNG, BMP, PDF, and TIFF
- For PDF and TIFF files, up to 2000 pages (only first two pages for the free tier) are processed.
- The file size must be less than 500 MB (4 MB for the free tier) and dimensions at least 50 x 50 pixels and at most 10000 x 10000 pixels.
- The minimum height of the text to be extracted is 12 pixels for a 1024X768 image. This corresponds to about 8 font point text at 150 DPI.
The Read API latest generally available (GA) model supports 164 languages for print text and 9 languages for handwritten text.
OCR for print text includes support for English, French, German, Italian, Portuguese, Spanish, Chinese, Japanese, Korean, Russian, Arabic, Hindi, and other international languages that use Latin, Cyrillic, Arabic, and Devanagari scripts.
OCR for handwritten text includes support for English, Chinese Simplified, French, German, Italian, Japanese, Korean, Portuguese, Spanish languages.
The Read API includes the following features.
- Print text extraction in 164 languages
- Handwritten text extraction in nine languages
- Text lines and words with location and confidence scores
- No language identification required
- Support for mixed languages, mixed mode (print and handwritten)
- Select pages and page ranges from large, multi-page documents
- Natural reading order option for text line output (Latin only)
- Handwriting classification for text lines (Latin only)
- Available as Distroless Docker container for on-premises deployment
Learn how to use the OCR features.
Use the cloud API or deploy on-premises
The Read 3.x cloud APIs are the preferred option for most customers because of ease of integration and fast productivity out of the box. Azure and the Computer Vision service handle scale, performance, data security, and compliance needs while you focus on meeting your customers' needs.
For on-premises deployment, the Read Docker container (preview) enables you to deploy the new OCR capabilities in your own local environment. Containers are great for specific security and data governance requirements.
The Computer Vision RecognizeText and ocr operations are no longer maintained, and are in the process of being deprecated in favor of the new Read API covered in this article. Existing customers should transition to using Read operations.
Data privacy and security
As with all of the Cognitive Services, developers using the Computer Vision service should be aware of Microsoft's policies on customer data. See the Cognitive Services page on the Microsoft Trust Center to learn more.
Submit and view feedback for