General concepts
What is Form Recognizer?
Azure Form Recognizer is a cloud-based Azure Applied AI Service that uses machine-learning models to extract key-value pairs, text, and tables from your documents. The returned result is a structured JSON output.
Form Recognizer use cases include automated data processing, enhanced data-driven strategies, and enriched document search capabilities.
Learn more about Form Recognizer use case scenarios.
Which Form Recognizer use cases require special consideration?
Give careful consideration to document processing projects that encompass financial, protected health, personal identity, or highly sensitive data.
Make certain to comply with all national, regional, and industry-specific requirements.
Learn more about use case considerations.
What languages are supported by Form Recognizer?
Form Recognizer's deep-learning-based universal models support many languages that can extract multi-lingual text from your images and documents, including text lines with mixed languages.
Language support varies by Form Recognizer service functionality. See language support for a complete list of the handwritten and printed text supported by Form Recognizer.
Is Form Recognizer available in my Azure region?
Form Recognizer is generally available in many of the 60+ Azure global infrastructure regions.
Choose the region that is best for you and your customers.
Does Form Recognizer integrate with other Microsoft services?
Yes, Form Recognizer integrates with the following services:
How is Form Recognizer related to OCR?
Azure Form Recognizer is a cloud-based Azure Applied AI Service that is built using optical character recognition (OCR), Text Analytics, and Custom Text from Azure Cognitive Services.
OCR is used to extract typeface and handwritten text documents.
Form Recognizer uses OCR to detect and extract information from forms and documents supported by AI to provide more structure and information to the text extraction.
Learn more about Form Recognizer and Applied AI Services.
What is the accuracy score and how is it calculated?
The output of a build (v3.0) or train (v2.1) custom model operation includes the estimated accuracy score. This score represents the model's ability to accurately predict the labeled value on a visually similar document.
Accuracy is measured within a percentage value range between 0% (low) and 100% (high).
How can I improve accuracy scores?
The accuracy of a model is influenced by variances in the visual structure of your documents.
Ensure that all variations of a document are included in the training dataset. Variations include different formats, for example, digital versus scanned PDFs.
Separate visually distinct document types and train different models.
Make sure that you don't have extraneous labels.
For signature and region labeling, don't include the surrounding text.
What is the confidence score and how is it calculated?
A confidence score indicates probability by measuring the degree of statistical certainty that the extracted result has been detected correctly.
The confidence value range is a percentage between 0% (low) and 100% (high).
It's best to target a score of 80% or higher. For more sensitive cases, like financial or medical records, a score of close to 100% is recommended. You may also require human review.
How can I improve confidence scores?
Following an analysis operation, review the JSON output. Examine the confidence values for each key/value result under the pageResults node. You should also look at the confidence score in the readResults node, which corresponds to the text-read operation. The confidence of the read results doesn’t affect the confidence of the key/value extraction results, so you should check both.
If the confidence score for the
readResultsobject is low, improve the quality of your input documents.If the confidence score for the
pageResultsobject is low, ensure that the documents being analyzed are of the same type.Consider incorporating human review into your workflows.
Use forms with different values in each field.
For custom models, use a larger set of training documents. Tagging more documents teaches your model to recognize fields with greater accuracy.
What is a bounding box?
A bounding box is an abstract rectangle that surrounds text elements on a document or form. It's used as a reference point for object detection.
The bounding box specifies position using an x and y coordinate plane presented in an array of four numerical pairs. Each pair represents a corner of the box in the following order: top-left, top-right, bottom-right, bottom-left.
For an image, coordinates are given in pixels.
For a PDF, coordinates are given in inches.
You can use the bounding box returned by Form Recognizer to identify the location of recognized entities.
App development
What are the development options for Form Recognizer
Form Recognizer offers the latest development options within the following platforms:
Where can I find the supported API version for the latest programming language SDKs?
This table provides links to the latest SDK versions and shows the relationship between supported Form Recognizer SDK and API versions:
| Supported Language | Azure SDK client-library | API reference | Supported API version |
|---|---|---|---|
| C#/.NET | 4.0.0-beta.4 | .NET SDK | 2022-06-30, 2022-01-30, 2021-09-30-preview, v2.1, v2.0 |
| Java | 4.0.0-beta.5 | Java SDK | 2022-06-30, 2022-01-30, 2021-09-30-preview, v2.1, v2.0 |
| JavaScript | 4.0.0-beta.4 | JavaScript SDK | 2022-06-30, 2022-01-30, 2021-09-30-preview, v2.1, v2.0 |
| Python | 3.2.0b5 | Python SDK | 2022-06-30, 2022-01-30, 2021-09-30-preview, v2.1, v2.0 |
What is the difference between Form Recognizer v3.0 and v2.1 and how do I migrate to the latest version?
For improved usability, Form Recognizer v3.0 introduces a fully redesigned client library. To successfully use the latest Form Recognizer API features—version 2021-09-30-preview and newer, the most recent SDK is required and your application code must be updated to use the new clients.
This table provides links to detailed instructions for migrating to the newest version of Form Recognizer:
| Language / API | Migration guide |
|---|---|
| REST API | v3 |
| C#/.NET | 4.0.0-beta.3 |
| Java | 4.0.0-beta.4 |
| JavaScript | 4.0.0-beta.3 |
| Python | 3.2.0b3 |
Which file formats does Form Recognizer support? Are there size limitations for input documents?
To ensure the best results, see input requirements.
How can I specify a specific range of pages to be analyzed in a document?
The parameter
pages(supported in both v2.1 and v3.0 REST API) enables you to specify pages for multi-page PDF and TIFF documents. Accepted input includes the following ranges:- Single pages (for example,'1, 2' -> pages 1 and 2 will be processed).- Finite (for example '2-5' -> pages 2 to 5 will be processed)
- Open-ended ranges (for example '5-' -> all the pages from page 5 will be processed & for example, '-10' -> pages 1 to 10 will be processed).
These parameters can be mixed together and ranges are allowed to overlap (for example, '-5, 1, 3, 5-10' - pages 1 to 10 will be processed).
The service will accept the request if it can process at least one page of the document. For example, using '5-100' on a five page document is a valid input where page 5 will be processed.
If no page range is provided, the entire document will be processed.
Both Form Recognizer Studio and the FOTT sample labeling tool are available. Which one should I use?
Most of time Form Recognizer Studio is recommended since it can reduce your time for configuring Form Recognizer resource and storage services.
Consider using FOTT (Form OCR Testing Tool) for the following scenarios:
Your data must remain within a single machine. Use the FOTT sample labeling tool and Form Recognizer container.
Your project is highly dependent on Form Recognizer v2.1 and you intend to continue using the v2.1 APIs.
Service limits and pricing
How does Azure calculate the price for using Form Recognizer services?
Form Recognizer billing is calculated monthly based on the model type and number of pages analyzed:
When you submit a document for analysis, all pages are analyzed unless you specify a page range with the
pagesparameter in your request. When analyzing Microsoft Excel and PowerPoint documents with the new Read OCR model, each worksheet and slide is counted as one page respectively.When analyzing PDF and TIFF files, each page in the PDF file or each image in the TIFF file is counted as one page with no maximum character limits.
When analyzing Microsoft Word and HTML files supported by only the Read model, pages are counted in blocks of 3,000 characters each. For example, if your document contains 7,000 characters, the two pages with 3,000 characters each and one page with 1,000 characters will add up to a total of three pages.
In addition, when using the Read model, if your Microsoft Word, Excel, and PowerPoint pages have embedded images, each image will be analyzed and counted as a page. Therefore, the total analyzed pages for Microsoft Office documents will be equal to the sum of total text pages and total images analyzed. In the previous example if the document contains 2 embedded images, the total page count in the service output will be three text pages plus two images equaling a total of five pages.
Training a custom model is always free with Form Recognizer. You’re only charged when a model is used to analyze a document.
Container pricing is the same as cloud service pricing.
Form Recognizer offers a free tier (F0) that enables you to test all the Form Recognizer features.
Form Recognizer has a commitment-based pricing model for large workloads.
Learn more about Azure Form Recognizer pricing options.
How can I check my Form Recognizer usage and estimate the price?
You can find usage metrics in the Azure portal metrics dashboard. The dashboard displays the number of pages processed by Azure Form Recognizer. You can check the estimated cost spent on the resource using the Azure pricing calculator. For detailed instructions, see Check my usage and estimate the cost.
What are best practices to mitigate throttling?
Form Recognizer uses autoscaling to provide the required computational resources on-demand, while keeping customer costs low. To mitigate throttling during autoscaling, we recommend the following approach:
Implement retry logic in your application.
If you find that you’re being throttled on the number of POST requests, consider adding a delay between the requests.
Increase the workload gradually. Avoid sharp changes.
Create a support request to increase transactions per second(TPS) limit.
Learn more about Form Recognizer service quotas and limits
How long will it take to analyze a document?
Form Recognizer is a multi-tenanted service where latency for similar documents is comparable but not always identical. The time to analyze a document depends on the size (for example, number of pages) and associated content on each page.
Latency is the amount of time it takes for an API server to handle and process an incoming request and deliver the outgoing response to the client. Occasional variability in latency and performance is inherent in any micro-service-based, stateless, asynchronous service that processes images and large documents at scale. While we're continuously scaling up the hardware and capacity and scaling capabilities, you may still see latency issues at run time.
Custom models
How do I assemble the best training data?
When you use the Form Recognizer custom model, you provide your own training data. Here are a few tips to help train your models effectively:
Use text-based instead of image-based PDFs when possible. One way to identify an image-based PDF is to try selecting specific text in the document. If you can only select the entire image of the text, the document is image-based, not text-based.
Organize your training documents by using a subfolder for each format (JPEG/JPG, PNG, BMP, PDF, or TIFF).
Use forms that have all of the available fields completed.
Use forms with differing values in each field.
If your images are low quality, use a larger data set (more than five training documents).
Learn more about building a training data set
What are best practices for training a highly accurate custom model?
The level of accuracy for your model is dependent on the quality of your training materials:
Determine if you need to use a single model or multiple models composed into a single model.
Model accuracy can decrease when you have different formats analyzed with a single model. Plan on segmenting your dataset into folders, where each folder is a unique template. Train one model per folder and compose the resulting models into a single endpoint.
Custom forms rely on a consistent visual template. If your form has variations with formats and page breaks, consider segmenting your dataset to train multiple models.
Ensure you have a balanced dataset by accounting for formats, document types, and structure.
Learn more about composed models.
Can I retrain a custom model?
Form Recognizer doesn’t have an explicit retrain operation. Each train operation generates a new model.
If you find that your model needs retraining, add more samples to your training data set and train a new model.
How many custom models can I compose into a single custom model?
With the Model Compose operation, you can assign up to 100 models to a single model ID.
When you make the Analyze request with a composed modelID, Form Recognizer classifies the submitted form, chooses the best model, and returns the results.
Model Compose is currently available only for custom models trained with labels.
Learn more about composed models.
If the number of models I want to compose exceeds the upper limit of composed model, what are the alternatives?
You can classify the documents before calling the custom model or consider Custom neural model:
Use Read model and build a classification based on the extracted text from the documents and certain phrases using code, regular expressions, search etc.
If you want to extract the same fields from various structured, semi-structured, and unstructured documents. Consider using the deep learning custom neural model. Learn more about the differences between custom template model and custom neural model.
How do I refine a model beyond the initial training?
Each train operation generates a new model.
Start by creating a dataset for your new template.
Label and train a new model.
Validate that the new model performs well for your specific document types.
Compose your new model with the existing model into a single endpoint. Form Recognizer can then determine the best model for each document to be analyzed.
Learn more about composed models.
I'm building a custom model, what does the signature-detection label return?
Signature detection looks for the presence of a signature, not the identity of the person signing the document.
If the model returns "unsigned" for signature detection, the model didn’t find a signature in the defined field.
What should I consider and what are best practices for extracting tables from documents?
You can start with the Form Recognizer layout model to extract texts, tables, selection marks, and structure information from documents and images. You can also consider the following factors:
Is the data that you wish to extract presented as a table and is the table structure meaningful?
If the data isn’t in a table format, can the data fit in a two-dimensional grid?
Do your tables span across multiple pages? If so, to avoid having to label all of the pages, split the PDF into pages prior to sending it to Form Recognizer. Following the analysis, post-process the pages to a single table.
If you’re creating custom models, refer to Labeling as tables. Dynamic tables have a variable number of rows for each given column. Fixed tables have a constant number of rows for each given column.
How can I move my trained models from one environment (like beta) to another (like production)?
The Copy API enables this scenario by allowing you to copy custom models from one Form Recognizer account or into others, which can exist in any supported geographical region. Follow this document for detailed instructions. The copy operation is limited to copying models within the specific cloud environment the model was trained in. For instance, copying models from the public cloud to the Azure Government clod isn't supported.
Storage account
I was able to access my storage account a few days ago. Why am I now having trouble reconnecting?
When you create a shared access signature (SAS), the default duration is 48 hours. After 48 hours, you'll need to create a new token.
Consider setting a longer duration period for the time you'll be using your storage account with Form Recognizer.
If my storage account is behind a VNet or firewall, how do I give Form Recognizer access to my storage account data?
If you have an Azure storage account protected by a Virtual Network (VNet) or firewall, Form Recognizer can’t directly access your storage account. However, Private Azure storage account access and authentication are supported by managed identities for Azure resources. Once a managed identity is enabled, the Form Recognizer service can access your storage account using an assigned managed identity credential.
If you intend to analyze your private storage account data with FOTT, the tool must be deployed behind the VNet or firewall.
Learn how to create and use a managed identity for your Form Recognizer resource
Form Recognizer Studio
What permissions do I need to access Form Recognizer Studio?
You need an active Azure account and subscription with at least a Reader role to access Form Recognizer Studio.
For document analysis and prebuilt models, you need full access—Contributor role—to at least one Form Recognizer or Cognitive Services multi-service resource to enter the analyze page. Once you access the model analyze page, you can change the endpoint and key to access other resources, if needed.
For custom models, you can either use a Contributor role, or use the endpoint and key of a Form Recognizer or Cognitive Services multi-service resource to create a project. You also need to have Contributor role to access to at least one blob storage account.
For more information, see Azure AD built-in roles.
I have multiple pages in a document. Why are there only two pages analyzed in Form Recognizer Studio?
For free (F0) tier resources, only the first two pages are analyzed no matter you're using Form Recognizer Studio, REST API or SDKs. In Form Recognizer Studio, select the top right gear button (Settings), choose the Resources tab and check the Price Tier you're using to analyze the documents. Change to an S0 paid resource if you want to analyze all pages in a document.
How can I change directories or subscriptions to use in Form Recognizer Studio?
In Form Recognizer Studio, you can select the top right gear button (Settings), under Directory, search and select the directory from the list and select on Switch Directory. You'll be prompted to sign in again after switching directory.
Switching subscriptions or resources can be done under Settings -> Resource tab.
Containers
Do I need an internet connection to use Form Recognizer containers?
Yes, Form Recognizer containers require internet connectivity to send billing information to Azure. Learn more about Azure container security.
What's the difference between disconnected and connected container?
Form Recognizer connected containers send billing information to Azure using a Form Recognizer resource on your Azure account. With connected containers, internet connectivity is required to send billing information to Azure.
Disconnected containers enable you to use APIs disconnected from the internet. Billing information isn't sent via the internet, instead you're charged based upon a purchased commitment tier. Currently, disconnected container usage is available for Form Recognizer custom and invoice models. The model capabilities provided in connected and disconnected containers are the same and supported by Form Recognizer v2.1.
What data do connected containers send to the cloud?
Form Recognizer connected containers send billing information to Azure by using a Form Recognizer resource on your Azure account. Connected containers don't send customer data, such as the image or text that's being analyzed, to Microsoft. See the Cognitive Services container FAQ for an example of the information sent to Microsoft for billing.
I received an "OutOfQuota" error message: "Container isn't in a valid state. Subscription validation failed with status 'OutOfQuota'. API key is out of quota".
Form Recognizer connected containers send billing information to Azure by using a Form Recognizer resource on your Azure account. You could get this message if the containers can't communicate with the billing endpoint.
Can I use local storage for the Form Recognizer Sample Labeling Tool (FOTT) container?
FOTT has a version that uses local storage. The version needs to be installed on a Windows machine. You can install it from this location. On the project page, specify the Label folder URI as /shared or /shared/sub-dir if your labeling files are in a sub directory. All other Form Recognizer Sample Labeling Tool behavior is the same as the hosted service.
Security and Privacy
What are the different methods and requirements for authenticating requests to Azure Applied AI Services?
Each request to an Azure service must include an authentication header. You can authenticate a request with several methods:
Authenticate with a single-service or multi-service key.
Authenticate with Azure Active Directory (Azure AD).
Enable customer-managed keys.
Authorize managed identities.
Does Form Recognizer store my data?
For all features, Form Recognizer temporarily stores data and results in Azure storage in the same region as the request. Your data is then deleted within 24 hours from the time an analyze request was submitted.
Learn more about Data, privacy, and security for Form Recognizer.
How are my trained custom models stored and utilized in Form Recognizer?
The Custom model feature allows customers to build custom models from training data stored in customer’s Azure blob storage locations. The interim outputs after analysis and labeling are stored in the same location. The trained custom models are stored in Azure storage in the same region and logically isolated with their Azure subscription and API credentials.
More help and support
Where can I find more solutions to my Azure Form Recognizer questions?
Microsoft Q & A is the home for technical questions and answers at Microsoft. You can filter queries specific to Form Recognizer.
What should I do if specific text isn’t recognized or recognized incorrectly when labeling documents?
We continually update and improve the Form Recognizer OCR model. You can reach out to the Form Recognizer team: formrecog_contact@microsoft.com. If possible, share a sample document with the issue highlighted.