Form Recognizer receipt model

The receipt model combines powerful Optical Character Recognition (OCR) capabilities with deep learning models to analyze and extract key information from sales receipts. Receipts can be of various formats and quality including printed and handwritten receipts. The API extracts key information such as merchant name, merchant phone number, transaction date, tax, and transaction total and returns a structured JSON data representation.

Sample receipt processed with Form Recognizer Sample Labeling tool:

sample receipt

Development options

The following resources are supported by Form Recognizer v2.1:

Feature Resources
Receipt model

The following resources are supported by Form Recognizer v3.0:

Feature Resources Model ID
Receipt model prebuilt-receipt

Try Form Recognizer

See how data, including time and date of transactions, merchant information, and amount totals, is extracted from receipts using the Form Recognizer Studio or our Sample Labeling tool. You'll need the following:

  • An Azure subscription—you can create one for free

  • A Form Recognizer instance ) in the Azure portal. You can use the free pricing tier (F0) to try the service. After your resource deploys, select Go to resource to get your API key and endpoint.

Screenshot: keys and endpoint location in the Azure portal.

Form Recognizer Studio (preview)

Note

Form Recognizer studio is available with the preview (v3.0) API.

  1. On the Form Recognizer Studio home page, select Receipts

  2. You can analyze the sample receipt or select the + Add button to upload your own sample.

  3. Select the Analyze button:

    Screenshot: analyze receipt menu.

Sample Labeling tool

You will need a receipt document. You can use our sample receipt document.

  1. On the Sample Labeling tool home page, select Use prebuilt model to get data.

  2. Select Receipt from the Form Type dropdown menu:

    Screenshot: Sample Labeling tool dropdown prebuilt model selection menu.

Input requirements

  • For best results, provide one clear photo or high-quality scan per document.
  • Supported file formats: JPEG, PNG, BMP, TIFF, and PDF (text-embedded or scanned). Text-embedded PDFs are best to eliminate the possibility of error in character extraction and location.
  • For PDF and TIFF, up to 2000 pages can be processed (with a free tier subscription, only the first two pages are processed).
  • The file size must be less than 50 MB.
  • Image dimensions must be between 50 x 50 pixels and 10000 x 10000 pixels.
  • PDF dimensions are up to 17 x 17 inches, corresponding to Legal or A3 paper size, or smaller.
  • The total size of the training data is 500 pages or less.
  • If your PDFs are password-locked, you must remove the lock before submission.
  • For unsupervised learning (without labeled data):
    • Data must contain keys and values.
    • Keys must appear above or to the left of the values; they can't appear below or to the right.

Supported languages and locales v2.1

Note

It's not necessary to specify a locale. This is an optional parameter. The Form Recognizer deep-learning technology will auto-detect the language of the text in your image.

Model Language—Locale code Default
Receipt
  • English (United States)—en-US
  • English (Australia)—en-AU
  • English (Canada)—en-CA
  • English (United Kingdom)—en-GB
  • English (India)—en-IN
Autodetected

Field extraction

Name Type Description Standardized output
ReceiptType String Type of sales receipt Itemized
MerchantName String Name of the merchant issuing the receipt
MerchantPhoneNumber phoneNumber Listed phone number of merchant +1 xxx xxx xxxx
MerchantAddress String Listed address of merchant
TransactionDate Date Date the receipt was issued yyyy-mm-dd
TransactionTime Time Time the receipt was issued hh-mm-ss (24-hour)
Total Number (USD) Full transaction total of receipt Two-decimal float
Subtotal Number (USD) Subtotal of receipt, often before taxes are applied Two-decimal float
Tax Number (USD) Tax on receipt (often sales tax or equivalent) Two-decimal float
Tip Number (USD) Tip included by buyer Two-decimal float
Items Array of objects Extracted line items, with name, quantity, unit price, and total price extracted
Name String Item name
Quantity Number Quantity of each item Integer
Price Number Individual price of each item unit Two-decimal float
Total Price Number Total price of line item Two-decimal float

Form Recognizer preview v3.0

The Form Recognizer preview introduces several new features and capabilities. The Receipt model supports single-page hotel receipt processing.

Hotel receipt field extraction

Name Type Description Standardized output
ArrivalDate Date Date of arrival yyyy-mm-dd
Currency Currency Currency unit of receipt amounts. For example USD, EUR, or MIXED if multiple values are found
DepartureDate Date Date of departure yyyy-mm-dd
Items Array
Items.*.Category String Item category, for example, Room, Tax, etc.
Items.*.Date Date Item date yyyy-mm-dd
Items.*.Description String Item description
Items.*.TotalPrice Number Item total price Integer
Locale String Locale of the receipt, for example, en-US. ISO language-county code
MerchantAddress String Listed address of merchant
MerchantAliases Array
MerchantAliases.* String Alternative name of merchant
MerchantName String Name of the merchant issuing the receipt
MerchantPhoneNumber phoneNumber Listed phone number of merchant +1 xxx xxx xxxx
ReceiptType String Type of receipt, for example, Hotel, Itemized
Total Number Full transaction total of receipt Two-decimal float

Hotel receipt supported languages and locales

Model Language—Locale code Default
Receipt (hotel)
  • English (United States)—en-US
English (United States)—en-US

Migration guide and REST API v3.0

Next steps