Azure AI Document Intelligence FAQ

This content applies to: checkmark v4.0 (preview) checkmark v3.1 (GA) checkmark v3.0 (GA) checkmark v2.1 (GA)

General concepts

What is Azure AI Document Intelligence, and what happened to Azure AI Form Recognizer?

Azure AI Document Intelligence is a cloud-based service that uses machine-learning models to extract key/value pairs, text, and tables from your documents. The returned result is a structured JSON output. Document Intelligence use cases include automated data processing, enhanced data-driven strategies, and enriched document search capabilities.

Document Intelligence is part of Azure AI services. Azure AI services encompass all of what were previously known as Azure Cognitive Services and Azure Applied AI Services.

The previous name for Document Intelligence was Azure AI Form Recognizer. Form Recognizer officially became Document Intelligence in July 2023.

There are no changes to pricing. The names Cognitive Services and Applied AI Services continue to be used in Azure billing, cost analysis, price lists, and price APIs.

There are no breaking changes to APIs or client libraries (SDKs). REST APIs and SDK versions 2024-02-29-preview, 2023-10-31-preview, and later are renamed document intelligence.

Some platforms are still awaiting the renaming update. In Microsoft documentation, all mentions of Form Recognizer and Document Intelligence refer to the same Azure service.

You can use a document generative AI solution to chat with your documents, generate captivating content from those documents, and access Azure OpenAI Service models on your data. With Azure AI Document Intelligence and Azure OpenAI combined, you can build an enterprise application to seamlessly interact with your documents by using natural languages, easily find answers and gain valuable insights, and generate new and engaging content from your existing documents. Find more details in the technical community blog.

Semantic chunking is a key step in retrieval-augmented generation (RAG) to ensure its efficient storage and retrieval. The Document Intelligence layout model offers a comprehensive solution for the capabilities of advanced content extraction and document structure analysis.

With the layout model, you can easily extract text and structural elements to divide large bodies of text into smaller, meaningful chunks based on semantic content rather than arbitrary splits. You can then conveniently output the extracted information to Markdown format, so that you can define your semantic chunking strategy based on provided building blocks. Find more details in the overview of RAG in Document Intelligence.

Which Document Intelligence use cases require special consideration?

Give careful consideration to document processing projects that encompass financial data, protected health data, personal data, or highly sensitive data.

Be sure to comply with all national/regional and industry-specific requirements.

What languages does Document Intelligence support?

The deep-learning-based universal models in Document Intelligence support many languages that can extract multilingual text from your images and documents, including text lines with mixed languages.

Language support varies by Document Intelligence service functionality. For a complete list of the handwritten and printed text that Document Intelligence supports, see Language support.

Is Document Intelligence available in my Azure region?

Document Intelligence is generally available in many of the 60+ Azure global infrastructure regions.

Choose the region that's best for you and your customers.

Does Document Intelligence integrate with other Microsoft services?

Yes, Document Intelligence integrates with the following services:

Document Intelligence is a cloud-based service that incorporates optical character recognition (OCR), text analytics, and custom text classification from Azure AI services.

Document Intelligence uses OCR to detect and extract information from typeface and handwritten text documents supported by AI to provide more structure and information to the text extraction.

How long is my custom model available for use?

A model has the same life cycle as the API version that you use to train it. Custom models trained with a general availability (GA) version of the API have the same life cycle as the API version. When the API version is deprecated, the model is no longer available for inference. Models trained with a preview version of the API also have the same life cycle as the preview API.

Expect preview API deprecation within three months of an updated preview API version or newer GA API version.

What is the accuracy score, and how is it calculated?

The output of a build (v3.0 and later versions) or train (v2.1) custom model operation includes the estimated accuracy score. This score represents the model's ability to accurately predict the labeled value on a visually similar document.

Accuracy is measured within a percentage value range from 0% (low) to 100% (high).

For more information, see Accuracy and confidence scores.

How can I improve accuracy scores?

Variances in the visual structure of your documents can influence the accuracy of a model. Here are some tips:

  • Include all variations of a document in the training dataset. Variations include different formats; for example, digital versus scanned PDFs.

  • Separate visually distinct document types and train different models.

  • Make sure that you don't have extraneous labels.

  • For signature and region labeling, don't include the surrounding text.

For more information, see Accuracy and confidence scores.

What is the confidence score, and how is it calculated?

A confidence score indicates probability by measuring the degree of statistical certainty that the extracted result is detected correctly.

The confidence value range is a percentage from 0% (low) to 100% (high). It's best to target a score of 80% or higher. For more sensitive cases, like financial or medical records, we recommend a score of close to 100%. You can also require human review.

For more information, see Accuracy and confidence scores.

How can I improve confidence scores?

After an analysis operation, review the JSON output. Examine the confidence values for each key/value result under the pageResults node. You should also look at the confidence score in the readResults node, which corresponds to the text-read operation. The confidence of the read results doesn't affect the confidence of the key/value extraction results, so you should check both. Here are some tips:

  • If the confidence score for the readResults object is low, improve the quality of your input documents.

  • If the confidence score for the pageResults object is low, ensure that the documents you're analyzing are of the same type.

  • Consider incorporating human review into your workflows.

  • Use forms that have different values in each field.

  • For custom models, use a larger set of training documents. Tagging more documents teaches your model to recognize fields with greater accuracy.

For more information, see Accuracy and confidence scores.

What is a bounding box?

A bounding box (polygon in v3.0 and later versions) is an abstract rectangle that surrounds text elements in a document or form. It's used as a reference point for object detection.

The bounding box specifies position by using an x and y coordinate plane presented in an array of four numerical pairs. Each pair represents a corner of the box in the following order: upper left, upper right, lower right, lower left.

For an image, coordinates are in pixels. For a PDF, coordinates are in inches.

Can Document Intelligence help me classify documents?

Document Intelligence provides custom classification models that can analyze single-file or multiple-file documents to identify if an input file contains any of the trained document types. The service supports following scenarios:

  • A single file that contains one document type, such as a loan application form.

  • A single file that contains multiple documents. An example is a loan application package that contains a loan application form, payslip, and bank statement.

  • A single file that contains multiple instances of the same document. An example is a collection of scanned invoices.

For more information, see the overview of custom classification models.

App development

What are the development options for Document Intelligence?

Document Intelligence offers the latest development options within the following platforms:

Where can I find the supported API version for the latest programming language SDKs?

This table provides links to the latest SDK versions and shows the relationship between supported Document Intelligence SDK and API versions:

Supported language Azure SDK reference Supported API versions
• C#/.NET: 4.0.0

• Java: 4.0.0

• JavaScript: 4.0.0

• Python 3.2.0
2023-10-31-preview
v3.0
v2.1
v2.0

For more information, see Supported clients for v4.0 and Supported clients for v3.1.

What is the difference between Document Intelligence v3.0 and v2.1, and how do I migrate to the latest version?

For improved usability, Document Intelligence v3.0 introduces a fully redesigned client library. To successfully use the latest Document Intelligence API features, you need the most recent SDK, and your application code must be updated to use the new clients.

This table provides links to detailed instructions for migrating to the newest version of Document Intelligence:

Language/API Migration guide
REST API v3
C#/.NET 4.0.0
Java 4.0.0
JavaScript 4.0.0
Python 3.2.0

Which file formats does Document Intelligence support? Are there size limitations for input documents?

To get the best results, see the input requirements.

How can I specify a range of pages to be analyzed in a document?

Use the pages parameter (supported in v2.1, v3.0, and later versions of the REST API) to specify pages for multiple-page PDF and TIFF documents. Accepted input includes the following ranges:

  • Single pages. For example, if you specify 1, 2, pages 1 and 2 are processed.
  • Finite ranges. For example, if you specify 2-5, pages 2 to 5 are processed.
  • Open-ended ranges. For example, if you specify 5-, all the pages from page 5 are processed. If you specify -10, pages 1 to 10 are processed.

You can mix these parameters together, and ranges can overlap. For example, if you specify -5, 1, 3, 5-10, pages 1 to 10 are processed.

The service accepts the request if it can process at least one page of the document. For example, using 5-100 on a five-page document is a valid input that means page 5 is processed.

If you don't provide a page range, the entire document is processed.

Both Document Intelligence Studio and the FOTT Sample Labeling tool are available. Which one should I use?

Most of the time, we recommend Document Intelligence Studio because it can reduce your time for configuring Document Intelligence resources and storage services.

Consider using the Form OCR Testing Tool (FOTT) for the following scenarios:

Service limits and pricing

How does Azure calculate the price for using Document Intelligence?

Document Intelligence billing is calculated monthly based on the model type and the number of pages analyzed. Here are some details:

  • When you submit a document for analysis, the service analyzes all pages unless you specify a page range by using the pages parameter in your request. When the service analyzes Microsoft Excel and PowerPoint documents through the read, OCR, or layout model, it counts each Excel worksheet and PowerPoint slide as one page.

  • When the service analyzes PDF and TIFF files, it counts each page in the PDF file or each image in the TIFF file as one page with no maximum character limits.

  • When the service analyzes Microsoft Word and HTML files that the read and layout models support, it counts pages in blocks of 3,000 characters each. For example, if your document contains 7,000 characters, the two pages with 3,000 characters each and one page with 1,000 characters add up to a total of three pages.

  • When you're using the read or layout model to analyze Microsoft Word, Excel, PowerPoint, and HTML files, embedded or linked images aren't supported. So the service doesn't count them as added images.

  • Training a custom model is always free with Document Intelligence. You're charged only when the service uses a model to analyze a document.

  • Container pricing is the same as cloud service pricing.

  • Document Intelligence offers a free tier (F0) where you can test all the Document Intelligence features.

  • Document Intelligence has a commitment-based pricing model for large workloads.

Learn more about Azure AI Document Intelligence pricing options.

How can I check my Document Intelligence usage and estimate the price?

You can find usage metrics on the metrics dashboard in the Azure portal. The dashboard displays the number of pages that Azure AI Document Intelligence processes. You can check the estimated cost spent on the resource by using the Azure pricing calculator. For detailed instructions, see Check usage and estimate cost.

What are the best practices to mitigate throttling?

Document Intelligence uses autoscaling to provide the required computational resources on demand, while keeping customer costs low. To mitigate throttling during autoscaling, we recommend the following approach:

  • Implement retry logic in your application.

  • If you find that you're being throttled on the number of POST requests, consider adding a delay between the requests.

  • Increase the workload gradually. Avoid sharp changes.

  • Create a support request to increase transactions per second (TPS) limit.

Learn more about Document Intelligence service quotas and limits.

How long does it take to analyze a document?

The time to analyze a document depends on the size (for example, number of pages) and associated content on each page.

Document Intelligence is a multitenant service where latency for similar documents is comparable but not always identical. Latency is the amount of time it takes for an API server to handle and process an incoming request and deliver the outgoing response to the client. Occasional variability in latency and performance is inherent in any microservice-based, stateless, asynchronous service that processes images and large documents at scale.

Although we're continuously scaling up the hardware and capacity and scaling capabilities, you might still have latency issues at runtime.

Custom models

How do I assemble the best training data?

When you use the Document Intelligence custom model, you provide your own training data. Here are a few tips to help train your models effectively:

  • Use text-based instead of image-based PDFs when possible. One way to identify an image-based PDF is to try selecting specific text in the document. If you can select only the entire image of the text, the document is image based, not text based.

  • Organize your training documents by using a subfolder for each format (JPEG/JPG, PNG, BMP, PDF, or TIFF).

  • Use forms that have all of the available fields completed.

  • Use forms with differing values in each field.

  • If your images are low quality, use a larger dataset (more than five training documents).

Learn more about building a training dataset.

What are the best practices for training a highly accurate custom model?

The level of accuracy for your model depends on the quality of your training materials. Here are some tips:

  • Determine if you need to use a single model or multiple models composed into a single model.

  • Model accuracy can decrease when you have different formats analyzed with a single model. Plan on segmenting your dataset into folders, where each folder is a unique template. Train one model per folder, and compose the resulting models into a single endpoint.

  • Custom forms rely on a consistent visual template. If your form has variations with formats and page breaks, consider segmenting your dataset to train multiple models.

  • Ensure that you have a balanced dataset by accounting for formats, document types, and structure.

Learn more about composed models.

Can I retrain a custom model?

Document Intelligence doesn't have an explicit retrain operation. Each train operation generates a new model.

If you find that your model needs retraining, add more samples to your training dataset and train a new model.

How many custom models can I compose into a single custom model?

With the Model Compose operation, you can assign up to 200 models to a single model ID. When you make the Analyze Document request with a composed model ID, Document Intelligence classifies the submitted form, chooses the best model, and returns the results. Model Compose is currently available only for custom models trained with labels.

Analyzing a document by using composed models is identical to analyzing a document by using a single model. The Analyze Document result returns a docType property that indicates which of the component models you selected for analyzing the document. There's no change in pricing for analyzing a document by using an individual custom model or a composed custom model.

Learn more about composed models.

If the number of models that I want to compose exceeds the upper limit of a composed model, what are the alternatives?

You can use one of these alternatives:

How do I refine a model beyond the initial training?

Each training operation generates a new model.

  1. Create a dataset for your new template.

  2. Label and train a new model.

  3. Validate that the new model performs well for your specific document types.

  4. Compose your new model with the existing model into a single endpoint. Document Intelligence can then determine the best model for each document to be analyzed.

Learn more about composed models.

I'm building a custom model. What does the signature-detection label return?

Signature detection looks for the presence of a signature, not the identity of the person who signs the document.

If the model returns unsigned for signature detection, the model didn't find a signature in the defined field.

What should I consider and what are the best practices for extracting tables from documents?

You can start with the Document Intelligence layout model to extract texts, tables, selection marks, and structure information from documents and images. You can also consider the following factors:

  • Is the data that you want to extract presented as a table, and is the table structure meaningful?

  • If the data isn't in a table format, can the data fit in a two-dimensional grid?

  • Do your tables span multiple pages? If so, to avoid having to label all the pages, split the PDF into pages before sending it to Document Intelligence. After the analysis, post-process the pages to a single table.

  • If you're creating custom models, refer to Labeling as tables. Dynamic tables have a variable number of rows for each column. Fixed tables have a constant number of rows for each column.

How can I move my trained models from one environment (like beta) to another (like production)?

You can use the Copy API to copy custom models from one Document Intelligence account into others that exist in any supported geographical region. For detailed instructions, see Disaster recovery.

The copy operation is limited to copying models within the specific cloud environment where you trained the model. For instance, copying models from the public cloud to the Azure Government cloud isn't supported.

Why was I charged for layout when running custom training?

Layout is required to generate labels for your dataset. If the dataset that you use for custom training doesn't have label files available, the service generates them for you.

Storage account

I was able to access my storage account a few days ago. Why am I now having trouble reconnecting?

When you create a shared access signature, the default duration is 48 hours. After 48 hours, you need to create a new token.

Consider setting a longer duration period for the time that you're using your storage account with Document Intelligence.

If my storage account is behind a virtual network or firewall, how do I give Document Intelligence access to the data?

If you have an Azure storage account protected by a virtual network or firewall, Document Intelligence can't directly access your storage account. However, private Azure storage account access and authentication support managed identities for Azure resources. When you use a managed identity, the Document Intelligence service can access your storage account by using an assigned credential.

If you intend to analyze your private storage account data by using FOTT, you must deploy the tool behind the virtual network or firewall.

Learn how to create and use a managed identity for your Document Intelligence resource.

Document Intelligence Studio

What permissions do I need to access Document Intelligence Studio?

You need an active Azure account and subscription with at least a Reader role to access Document Intelligence Studio.

For document analysis and prebuilt models, here are the role requirements for user scenarios:

  • Basic

  • Advanced

    • Contributor: You need this role to create a resource group or a Document Intelligence resource. The Contributor role doesn't allow you to list keys for Cognitive Services. To use Document Intelligence Studio, you still need the Cognitive Services User role.

For custom model projects, here are the role requirements for user scenarios:

  • Basic

    • Cognitive Services User: You need this role for a Document Intelligence or Cognitive Services multiple-service resource to train a custom model or analyze with trained models.

    • Storage Blob Data Contributor: You need this role for a storage account to create project and label data.

  • Advanced

    • Storage Account Contributor: You need this role for the storage account to set up cross-origin resource sharing (CORS) settings. It's a one-time effort if you reuse the same storage account.

      The Contributor role doesn't allow you to access data in your blob. To use Document Intelligence Studio, you still need the Storage Blob Data Contributor role.

    • Contributor: You need this role to create a resource group and resources. The Contributor role doesn't give you access to use the created resources or storage. To use Document Intelligence Studio, you still need basic roles.

For more information, see Microsoft Entra built-in roles and the sections about Azure role assignments in the Document Intelligence Studio quickstart.

I have multiple pages in a document. Why are only two pages analyzed in Document Intelligence Studio?

For free-tier (F0) resources, only the first two pages are analyzed whether you're using Document Intelligence Studio, the REST API, or SDKs.

In Document Intelligence Studio, select the Settings (gear) button, select the Resources tab, and check the price tier that you're using to analyze the documents. If you want to analyze all pages in a document, change to a paid (S0) resource.

How can I change directories or subscriptions in Document Intelligence Studio?

To change a directory in Document Intelligence Studio, select the Settings (gear) button. Under Directory, select the directory from the list, and then select Switch Directory. You're prompted to sign in again after you switch the directory.

To change a subscription or resource, go to the Resource tab under Settings.

Why am I receiving a storage error on a project sharing, automatic labeling, or OCR upgrade operation when my storage account resource is configured with a firewall or virtual network?

Refer to Managed identities for Document Intelligence to set up your Azure resources.

Why am I receiving the error "Access denied due to Virtual Network/Firewall rules" on an automatic labeling or OCR upgrade operation when my Document Intelligence resource is configured with a firewall or virtual network?

You need to add the dedicated IP address 20.3.165.95 to the firewall allowlist for your Document Intelligence resource.

Can I reuse or customize the labeling experience from Document Intelligence Studio and build it into my own application?

Yes. The labeling experience from Document Intelligence Studio is open sourced in the Toolkit repo.

Why am I receiving the error "Form Recognizer Not Found" when opening my custom project?

Your Document Intelligence resource bound to this custom project was deleted or moved to another resource group. There are two ways to resolve this problem:

  • Re-create the Document Intelligence resource under the same subscription and resource group with the same name.

  • Re-create a custom project with the migrated Document Intelligence resource and specify the same storage account.

Containers

Do I need an internet connection to use Document Intelligence containers?

Yes. Document Intelligence containers require internet connectivity to send billing information to Azure. Learn more about Azure container security.

What's the difference between disconnected and connected containers?

Connected containers send billing information to Azure by using a Document Intelligence resource on your Azure account. With connected containers, internet connectivity is required to send billing information to Azure.

Disconnected containers enable you to use APIs that are disconnected from the internet. Billing information isn't sent via the internet. Instead, you're charged based on a purchased commitment tier. Currently, disconnected container usage is available for Document Intelligence custom and invoice models.

The model capabilities provided in connected and disconnected containers are the same and are supported by Document Intelligence v2.1.

What data do connected containers send to the cloud?

Document Intelligence connected containers send billing information to Azure by using a Document Intelligence resource on your Azure account. Connected containers don't send customer data, such as the image or text that's being analyzed, to Microsoft.

For an example of the information that connected containers send to Microsoft for billing, see the Azure AI container FAQ.

Why am I receiving the error "Container isn't in a valid state. Subscription validation failed with status 'OutOfQuota' API key is out of quota"?

Document Intelligence connected containers send billing information to Azure by using a Document Intelligence resource on your Azure account. You could get this message if the containers can't communicate with the billing endpoint.

Can I use local storage for the Document Intelligence Sample Labeling Tool (FOTT) container?

FOTT has a version that uses local storage. The version needs to be installed on a Windows machine. You can install it from this location.

On the project page, specify the label folder URI as /shared or /shared/sub-dir if your labeling files are in a subdirectory. All other Document Intelligence Sample Labeling Tool behavior is the same as the hosted service.

What is the best practice for scaling up?

For asynchronous calls, you can run multiple containers with shared storage. The container that's processing the POST analyze call stores the output in the storage. Then, any other container can fetch the results from the storage and serve the GET calls. The request ID isn't tied to a container.

For synchronous calls, you can run multiple containers, but only one container serves a request. Because it's a blocking call, any container from the pool can serve the request and send the response. Here, only one container is tied to a request at a time, and no polling is required.

How can I set up containers with shared storage?

The containers use the Mounts:Shared property while starting up for specifying the shared storage to store the processing files. To see the use of this property, refer to the containers documentation.

Security and privacy

What are the methods and requirements for authenticating requests to Azure AI services?

Each request to an Azure service must include an authentication header. You can authenticate a request by using several methods:

Does Document Intelligence store my data?

For all features, Document Intelligence temporarily stores data and results in Azure Storage in the same region as the request. Your data is then deleted within 24 hours from the time that you submit an analyze request.

Learn more about data, privacy, and security for Document Intelligence.

How are my trained custom models stored and used in Document Intelligence?

The interim outputs after analysis and labeling are stored in the same Azure Storage location where you store your training data. The trained custom models are stored in Azure Storage in the same region, and they're logically isolated with your Azure subscription and API credentials.

More help and support

Where can I find more solutions to my Azure AI Document Intelligence questions?

Microsoft Q&A is the home for technical questions and answers at Microsoft. You can filter queries that are specific to Document Intelligence.

What should I do if the service doesn't recognize specific text, or recognizes it incorrectly, when I'm labeling documents?

We continually update and improve the Document Intelligence OCR model. You can email the Document Intelligence team. If possible, share a sample document with the issue highlighted.