Form Recognizer Layout service

Azure Form Recognizer can extract text, tables, selection marks, and structure information from documents using its Layout service. The Layout API enables customers to take documents in a variety of formats and return structured data representations of the documents. It combines our powerful Optical Character Recognition (OCR) capabilities with deep learning models to extract text, tables, selection marks, and document structure.

What does the Layout service do?

The Layout API extracts text, tables, selection marks, and structure information from documents with exceptional accuracy and returns an organized, structured, JSON response. Documents can be of a variety of formats and quality, including phone-captured images, scanned documents, and digital PDFs. The Layout API will accurately extract the structured output from all of these documents.

Layout example

Try it out

To try out the Form Recognizer Layout Service, go to the online sample UI tool:

You will need an Azure subscription (create one for free) and a Form Recognizer resource endpoint and key to try out the Form Recognizer Layout API.

Sample UI screenshot; the text, tables, and selection marks of a document are analyzed

Input requirements

  • Supported file formats: JPEG, PNG, PDF, and TIFF
  • For PDF and TIFF, up to 2000 pages are processed. For free tier subscribers, only the first two pages are processed.
  • The file size must be less than 50 MB and dimensions at least 50 x 50 pixels and at most 10000 x 10000 pixels.
  • The PDF dimensions must be at most 17 x 17 inches, corresponding to legal or A3 paper sizes and smaller.

The Analyze Layout operation

First, call the Analyze Layout operation. Analyze Layout takes a document (image, TIFF, or PDF file) as the input and extracts the text, tables, selection marks, and structure of the document. The call returns a response header field called Operation-Location. The Operation-Location value is a URL that contains the Result ID to be used in the next step.

Response header Result URL
Operation-Location https://cognitiveservice/formrecognizer/v2.1-preview.3/layout/analyzeResults/44a436324-fc4b-4387-aa06-090cfbf0064f

Natural reading order output (Latin only)

You can specify the order in which the text lines are output with the readingOrder query parameter. Use natural for a more human-friendly reading order output as shown in the following example. This feature is only supported for Latin languages.

Layout Reading order example

Select page numbers or ranges for text extraction

For large multi-page documents, use the pages query parameter to indicate specific page numbers or page ranges for text extraction. The following example shows a document with 10 pages, with text extracted for both cases - all pages (1-10) and selected pages (3-6).

Layout selected pages output

The Get Analyze Layout Result operation

The second step is to call the Get Analyze Layout Result operation. This operation takes as input the Result ID that was created by the Analyze Layout operation. It returns a JSON response that contains a status field with the following possible values.

Field Type Possible values
status string notStarted: The analysis operation has not started.

running: The analysis operation is in progress.

failed: The analysis operation has failed.

succeeded: The analysis operation has succeeded.

Call this operation iteratively until it returns the succeeded value. Use an interval of 3 to 5 seconds to avoid exceeding the requests per second (RPS) rate.

When the status field has the succeeded value, the JSON response will include the extracted layout, text, tables, and selection marks. The extracted data includes extracted text lines and words, bounding boxes, text appearance with handwritten indication, tables, and selection marks with selected/unselected indicated.

Handwritten classification for text lines (Latin only)

The response includes classifying whether each text line is of handwriting style or not, along with a confidence score. This feature is only supported for Latin languages. The following example shows the handwritten classification for the text in the image.

handwriting classification example

Sample JSON output

The response to the Get Analyze Layout Result operation is a structured representation of the document with all the information extracted. See here for a sample document file and its structured output sample layout output.

The JSON output has two parts:

  • readResults node contains all of the recognized text and selection marks. Text is organized by page, then by line, then by individual words.
  • pageResults node contains the tables and cells extracted with their bounding boxes, confidence, and a reference to the lines and words in "readResults".

Example Output

Text

Layout API extracts text from documents (PDF, TIFF) and images (JPG, PNG, BMP) with multiple text angles and colors. It accepts photos of documents, faxes, printed and/or handwritten (English only) text, and mixed modes. Text is extracted with information provided on lines, words, bounding boxes, confidence scores, and style (handwritten or other). All the text information is included in the readResults section of the JSON output.

Tables

Layout API extracts tables from documents (PDF, TIFF) and images (JPG, PNG, BMP). Documents can be scanned, photographed, or digitized. Tables can be complex with merged cells or columns, with or without borders, and with odd angles. Extracted table information includes the number of columns and rows, row span, and column span. Each cell is extracted with its bounding box and reference to the text extracted in the readResults section. Table information is located in the pageResults section of the JSON output.

Tables example

Selection marks

Layout API also extracts selection marks from documents. Extracted selection marks include the bounding box, confidence, and state (selected/unselected). Selection mark information is extracted in the readResults section of the JSON output.

Next steps

See also