DocumentAnalysisClient Class

DocumentAnalysisClient analyzes information from documents and images. It is the interface to use for analyzing with prebuilt models (receipts, business cards, invoices, identity documents), analyzing layout from documents, analyzing general document types, and analyzing custom documents with built models. It provides different methods based on inputs from a URL and inputs from a stream.

Note

DocumentAnalysisClient should be used with API versions

2021-09-30-preview and up. To use API versions <=v2.1, instantiate a FormRecognizerClient.

New in version 2021-09-30-preview: The DocumentAnalysisClient and its client methods.

Inheritance
azure.ai.formrecognizer._form_base_client.FormRecognizerClientBase
DocumentAnalysisClient

Constructor

DocumentAnalysisClient(endpoint: str, credential: Union[AzureKeyCredential, TokenCredential], **kwargs: Any)

Parameters

endpoint
str
Required

Supported Cognitive Services endpoints (protocol and hostname, for example: https://westus2.api.cognitive.microsoft.com).

credential
AzureKeyCredential or TokenCredential
Required

Credentials needed for the client to connect to Azure. This is an instance of AzureKeyCredential if using an API key or a token credential from identity.

api_version
str or DocumentAnalysisApiVersion

The API version of the service to use for requests. It defaults to the latest service version. Setting to an older version may result in reduced feature compatibility. To use API versions <=v2.1, instantiate a FormRecognizerClient.

Examples

Creating the DocumentAnalysisClient with an endpoint and API key.


   from azure.core.credentials import AzureKeyCredential
   from azure.ai.formrecognizer import DocumentAnalysisClient

   endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
   key = os.environ["AZURE_FORM_RECOGNIZER_KEY"]

   document_analysis_client = DocumentAnalysisClient(endpoint, AzureKeyCredential(key))

Creating the DocumentAnalysisClient with a token credential.


   """DefaultAzureCredential will use the values from these environment
   variables: AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_CLIENT_SECRET
   """
   from azure.ai.formrecognizer import DocumentAnalysisClient
   from azure.identity import DefaultAzureCredential

   endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
   credential = DefaultAzureCredential()

   document_analysis_client = DocumentAnalysisClient(endpoint, credential)

Methods

begin_analyze_document

Analyze field text and semantic values from a given document.

begin_analyze_document_from_url

Analyze field text and semantic values from a given document. The input must be the location (URL) of the document to be analyzed.

close

Close the DocumentAnalysisClient session.

begin_analyze_document

Analyze field text and semantic values from a given document.

begin_analyze_document(model: str, document: Union[bytes, IO[bytes]], **kwargs: Any) -> LROPoller[AnalyzeResult]

Parameters

model
str
Required

A unique model identifier can be passed in as a string. Use this to specify the custom model ID or prebuilt model ID. Prebuilt model IDs supported can be found here: https://aka.ms/azsdk/formrecognizer/models

document
bytes or <xref:IO>[bytes]
Required

JPEG, PNG, PDF, TIFF, or BMP type file stream or bytes.

pages
str

Custom page numbers for multi-page documents(PDF/TIFF). Input the page numbers and/or ranges of pages you want to get in the result. For a range of pages, use a hyphen, like pages="1-3, 5-6". Separate each page number or range with a comma.

locale
str

Locale hint of the input document. See supported locales here: https://aka.ms/azsdk/formrecognizer/supportedlocales.

continuation_token
str

A continuation token to restart a poller from a saved state.

Returns

An instance of an LROPoller. Call result() on the poller object to return a AnalyzeResult.

Return type

Exceptions

Examples

Analyze an invoice. For more samples see the samples folder.


   from azure.core.credentials import AzureKeyCredential
   from azure.ai.formrecognizer import DocumentAnalysisClient

   endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
   key = os.environ["AZURE_FORM_RECOGNIZER_KEY"]

   document_analysis_client = DocumentAnalysisClient(
       endpoint=endpoint, credential=AzureKeyCredential(key)
   )
   with open(path_to_sample_documents, "rb") as f:
       poller = document_analysis_client.begin_analyze_document(
           "prebuilt-invoice", document=f, locale="en-US"
       )
   invoices = poller.result()

   for idx, invoice in enumerate(invoices.documents):
       print("--------Recognizing invoice #{}--------".format(idx + 1))
       vendor_name = invoice.fields.get("VendorName")
       if vendor_name:
           print(
               "Vendor Name: {} has confidence: {}".format(
                   vendor_name.value, vendor_name.confidence
               )
           )
       vendor_address = invoice.fields.get("VendorAddress")
       if vendor_address:
           print(
               "Vendor Address: {} has confidence: {}".format(
                   vendor_address.value, vendor_address.confidence
               )
           )
       vendor_address_recipient = invoice.fields.get("VendorAddressRecipient")
       if vendor_address_recipient:
           print(
               "Vendor Address Recipient: {} has confidence: {}".format(
                   vendor_address_recipient.value, vendor_address_recipient.confidence
               )
           )
       customer_name = invoice.fields.get("CustomerName")
       if customer_name:
           print(
               "Customer Name: {} has confidence: {}".format(
                   customer_name.value, customer_name.confidence
               )
           )
       customer_id = invoice.fields.get("CustomerId")
       if customer_id:
           print(
               "Customer Id: {} has confidence: {}".format(
                   customer_id.value, customer_id.confidence
               )
           )
       customer_address = invoice.fields.get("CustomerAddress")
       if customer_address:
           print(
               "Customer Address: {} has confidence: {}".format(
                   customer_address.value, customer_address.confidence
               )
           )
       customer_address_recipient = invoice.fields.get("CustomerAddressRecipient")
       if customer_address_recipient:
           print(
               "Customer Address Recipient: {} has confidence: {}".format(
                   customer_address_recipient.value,
                   customer_address_recipient.confidence,
               )
           )
       invoice_id = invoice.fields.get("InvoiceId")
       if invoice_id:
           print(
               "Invoice Id: {} has confidence: {}".format(
                   invoice_id.value, invoice_id.confidence
               )
           )
       invoice_date = invoice.fields.get("InvoiceDate")
       if invoice_date:
           print(
               "Invoice Date: {} has confidence: {}".format(
                   invoice_date.value, invoice_date.confidence
               )
           )
       invoice_total = invoice.fields.get("InvoiceTotal")
       if invoice_total:
           print(
               "Invoice Total: {} has confidence: {}".format(
                   invoice_total.value, invoice_total.confidence
               )
           )
       due_date = invoice.fields.get("DueDate")
       if due_date:
           print(
               "Due Date: {} has confidence: {}".format(
                   due_date.value, due_date.confidence
               )
           )
       purchase_order = invoice.fields.get("PurchaseOrder")
       if purchase_order:
           print(
               "Purchase Order: {} has confidence: {}".format(
                   purchase_order.value, purchase_order.confidence
               )
           )
       billing_address = invoice.fields.get("BillingAddress")
       if billing_address:
           print(
               "Billing Address: {} has confidence: {}".format(
                   billing_address.value, billing_address.confidence
               )
           )
       billing_address_recipient = invoice.fields.get("BillingAddressRecipient")
       if billing_address_recipient:
           print(
               "Billing Address Recipient: {} has confidence: {}".format(
                   billing_address_recipient.value,
                   billing_address_recipient.confidence,
               )
           )
       shipping_address = invoice.fields.get("ShippingAddress")
       if shipping_address:
           print(
               "Shipping Address: {} has confidence: {}".format(
                   shipping_address.value, shipping_address.confidence
               )
           )
       shipping_address_recipient = invoice.fields.get("ShippingAddressRecipient")
       if shipping_address_recipient:
           print(
               "Shipping Address Recipient: {} has confidence: {}".format(
                   shipping_address_recipient.value,
                   shipping_address_recipient.confidence,
               )
           )
       print("Invoice items:")
       for idx, item in enumerate(invoice.fields.get("Items").value):
           print("...Item #{}".format(idx + 1))
           item_description = item.value.get("Description")
           if item_description:
               print(
                   "......Description: {} has confidence: {}".format(
                       item_description.value, item_description.confidence
                   )
               )
           item_quantity = item.value.get("Quantity")
           if item_quantity:
               print(
                   "......Quantity: {} has confidence: {}".format(
                       item_quantity.value, item_quantity.confidence
                   )
               )
           unit = item.value.get("Unit")
           if unit:
               print(
                   "......Unit: {} has confidence: {}".format(
                       unit.value, unit.confidence
                   )
               )
           unit_price = item.value.get("UnitPrice")
           if unit_price:
               print(
                   "......Unit Price: {} has confidence: {}".format(
                       unit_price.value, unit_price.confidence
                   )
               )
           product_code = item.value.get("ProductCode")
           if product_code:
               print(
                   "......Product Code: {} has confidence: {}".format(
                       product_code.value, product_code.confidence
                   )
               )
           item_date = item.value.get("Date")
           if item_date:
               print(
                   "......Date: {} has confidence: {}".format(
                       item_date.value, item_date.confidence
                   )
               )
           tax = item.value.get("Tax")
           if tax:
               print(
                   "......Tax: {} has confidence: {}".format(tax.value, tax.confidence)
               )
           amount = item.value.get("Amount")
           if amount:
               print(
                   "......Amount: {} has confidence: {}".format(
                       amount.value, amount.confidence
                   )
               )
       subtotal = invoice.fields.get("SubTotal")
       if subtotal:
           print(
               "Subtotal: {} has confidence: {}".format(
                   subtotal.value, subtotal.confidence
               )
           )
       total_tax = invoice.fields.get("TotalTax")
       if total_tax:
           print(
               "Total Tax: {} has confidence: {}".format(
                   total_tax.value, total_tax.confidence
               )
           )
       previous_unpaid_balance = invoice.fields.get("PreviousUnpaidBalance")
       if previous_unpaid_balance:
           print(
               "Previous Unpaid Balance: {} has confidence: {}".format(
                   previous_unpaid_balance.value, previous_unpaid_balance.confidence
               )
           )
       amount_due = invoice.fields.get("AmountDue")
       if amount_due:
           print(
               "Amount Due: {} has confidence: {}".format(
                   amount_due.value, amount_due.confidence
               )
           )
       service_start_date = invoice.fields.get("ServiceStartDate")
       if service_start_date:
           print(
               "Service Start Date: {} has confidence: {}".format(
                   service_start_date.value, service_start_date.confidence
               )
           )
       service_end_date = invoice.fields.get("ServiceEndDate")
       if service_end_date:
           print(
               "Service End Date: {} has confidence: {}".format(
                   service_end_date.value, service_end_date.confidence
               )
           )
       service_address = invoice.fields.get("ServiceAddress")
       if service_address:
           print(
               "Service Address: {} has confidence: {}".format(
                   service_address.value, service_address.confidence
               )
           )
       service_address_recipient = invoice.fields.get("ServiceAddressRecipient")
       if service_address_recipient:
           print(
               "Service Address Recipient: {} has confidence: {}".format(
                   service_address_recipient.value,
                   service_address_recipient.confidence,
               )
           )
       remittance_address = invoice.fields.get("RemittanceAddress")
       if remittance_address:
           print(
               "Remittance Address: {} has confidence: {}".format(
                   remittance_address.value, remittance_address.confidence
               )
           )
       remittance_address_recipient = invoice.fields.get("RemittanceAddressRecipient")
       if remittance_address_recipient:
           print(
               "Remittance Address Recipient: {} has confidence: {}".format(
                   remittance_address_recipient.value,
                   remittance_address_recipient.confidence,
               )
           )

Analyze a custom document. For more samples see the samples folder.


   from azure.core.credentials import AzureKeyCredential
   from azure.ai.formrecognizer import DocumentAnalysisClient

   endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
   key = os.environ["AZURE_FORM_RECOGNIZER_KEY"]
   model_id = os.getenv("CUSTOM_BUILT_MODEL_ID", custom_model_id)

   document_analysis_client = DocumentAnalysisClient(
       endpoint=endpoint, credential=AzureKeyCredential(key)
   )

   # Make sure your document's type is included in the list of document types the custom model can analyze
   with open(path_to_sample_documents, "rb") as f:
       poller = document_analysis_client.begin_analyze_document(
           model=model_id, document=f
       )
   result = poller.result()

   for idx, document in enumerate(result.documents):
       print("--------Analyzing document #{}--------".format(idx + 1))
       print("Document has type {}".format(document.doc_type))
       print("Document has confidence {}".format(document.confidence))
       print("Document was analyzed by model with ID {}".format(result.model_id))
       for name, field in document.fields.items():
           field_value = field.value if field.value else field.content
           print("......found field of type '{}' with value '{}' and with confidence {}".format(field.value_type, field_value, field.confidence))


   # iterate over tables, lines, and selection marks on each page
   for page in result.pages:
       print("\nLines found on page {}".format(page.page_number))
       for line in page.lines:
           print("...Line '{}'".format(line.content))
       for word in page.words:
           print(
               "...Word '{}' has a confidence of {}".format(
                   word.content, word.confidence
               )
           )
       for selection_mark in page.selection_marks:
           print(
               "...Selection mark is '{}' and has a confidence of {}".format(
                   selection_mark.state, selection_mark.confidence
               )
           )

   for i, table in enumerate(result.tables):
       print("\nTable {} can be found on page:".format(i + 1))
       for region in table.bounding_regions:
           print("...{}".format(i + 1, region.page_number))
       for cell in table.cells:
           print(
               "...Cell[{}][{}] has content '{}'".format(
                   cell.row_index, cell.column_index, cell.content
               )
           )
   print("-----------------------------------")

begin_analyze_document_from_url

Analyze field text and semantic values from a given document. The input must be the location (URL) of the document to be analyzed.

begin_analyze_document_from_url(model: str, document_url: str, **kwargs: Any) -> LROPoller[AnalyzeResult]

Parameters

model
str
Required

A unique model identifier can be passed in as a string. Use this to specify the custom model ID or prebuilt model ID. Prebuilt model IDs supported can be found here: https://aka.ms/azsdk/formrecognizer/models

document_url
str
Required

The URL of the document to analyze. The input must be a valid, properly encoded (i.e. encode special characters, such as empty spaces), and publicly accessible URL of one of the supported formats: JPEG, PNG, PDF, TIFF, or BMP.

pages
str

Custom page numbers for multi-page documents(PDF/TIFF). Input the page numbers and/or ranges of pages you want to get in the result. For a range of pages, use a hyphen, like pages="1-3, 5-6". Separate each page number or range with a comma.

locale
str

Locale hint of the input document. See supported locales here: https://aka.ms/azsdk/formrecognizer/supportedlocales.

continuation_token
str

A continuation token to restart a poller from a saved state.

Returns

An instance of an LROPoller. Call result() on the poller object to return a AnalyzeResult.

Return type

Exceptions

Examples

Analyze a receipt. For more samples see the samples folder.


   from azure.core.credentials import AzureKeyCredential
   from azure.ai.formrecognizer import DocumentAnalysisClient

   endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
   key = os.environ["AZURE_FORM_RECOGNIZER_KEY"]

   document_analysis_client = DocumentAnalysisClient(
       endpoint=endpoint, credential=AzureKeyCredential(key)
   )
   url = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/formrecognizer/azure-ai-formrecognizer/tests/sample_forms/receipt/contoso-receipt.png"
   poller = document_analysis_client.begin_analyze_document_from_url(
       "prebuilt-receipt", document_url=url
   )
   receipts = poller.result()

   for idx, receipt in enumerate(receipts.documents):
       print("--------Recognizing receipt #{}--------".format(idx + 1))
       print("Receipt type: {}".format(receipt.doc_type or "N/A"))
       merchant_name = receipt.fields.get("MerchantName")
       if merchant_name:
           print(
               "Merchant Name: {} has confidence: {}".format(
                   merchant_name.value, merchant_name.confidence
               )
           )
       transaction_date = receipt.fields.get("TransactionDate")
       if transaction_date:
           print(
               "Transaction Date: {} has confidence: {}".format(
                   transaction_date.value, transaction_date.confidence
               )
           )
       if receipt.fields.get("Items"):
           print("Receipt items:")
           for idx, item in enumerate(receipt.fields.get("Items").value):
               print("...Item #{}".format(idx + 1))
               item_name = item.value.get("Name")
               if item_name:
                   print(
                       "......Item Name: {} has confidence: {}".format(
                           item_name.value, item_name.confidence
                       )
                   )
               item_quantity = item.value.get("Quantity")
               if item_quantity:
                   print(
                       "......Item Quantity: {} has confidence: {}".format(
                           item_quantity.value, item_quantity.confidence
                       )
                   )
               item_price = item.value.get("Price")
               if item_price:
                   print(
                       "......Individual Item Price: {} has confidence: {}".format(
                           item_price.value, item_price.confidence
                       )
                   )
               item_total_price = item.value.get("TotalPrice")
               if item_total_price:
                   print(
                       "......Total Item Price: {} has confidence: {}".format(
                           item_total_price.value, item_total_price.confidence
                       )
                   )
       subtotal = receipt.fields.get("Subtotal")
       if subtotal:
           print(
               "Subtotal: {} has confidence: {}".format(
                   subtotal.value, subtotal.confidence
               )
           )
       tax = receipt.fields.get("Tax")
       if tax:
           print("Tax: {} has confidence: {}".format(tax.value, tax.confidence))
       tip = receipt.fields.get("Tip")
       if tip:
           print("Tip: {} has confidence: {}".format(tip.value, tip.confidence))
       total = receipt.fields.get("Total")
       if total:
           print("Total: {} has confidence: {}".format(total.value, total.confidence))
       print("--------------------------------------")

close

Close the DocumentAnalysisClient session.

close() -> None

Exceptions