PDF with Multiple invoices

MN, Yogesh 0 Reputation points
2024-04-17T06:55:07.6233333+00:00

I am having multiple invoices in single PDF. Is there any way to split the pdf based on invoices or is there any way to identify if the PDF is having multiple invoices or single invoice(Language: Python).

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,379 questions
{count} votes

1 answer

Sort by: Most helpful
  1. navba-MSFT 17,120 Reputation points Microsoft Employee
    2024-04-17T10:25:20.71+00:00

    @MN, Yogesh Regarding your question, to split a multi-page PDF file into single pages, each containing one independent invoice, you can use Azure’s data processing capabilities. After splitting the file, you can send the location of the single-page PDF file to AI Document Intelligence for processing.

    The Document Intelligence invoice model can extract key information such as customer name, billing address, due date, and amount due, and returns a structured JSON data representation. This can help you identify if a PDF file contains multiple invoices.

    More info about DI invoice model: https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/concept-invoice?view=doc-intel-4.0.0

    For development, Document Intelligence supports various tools, applications, and libraries such as Document Intelligence Studio, REST API, and SDKs for C#, Python, Java, and JavaScript.

    Python sample using pre-built invoice model: https://learn.microsoft.com/en-us/python/api/overview/azure/ai-formrecognizer-readme?view=azure-python#using-prebuilt-models

    Please note that the features and processes may change based on user feedback as Document Intelligence is in active development. For best results, provide one clear photo or high-quality scan per document.

    Automate PDF processing: https://learn.microsoft.com/en-us/azure/architecture/ai-ml/architecture/automate-pdf-forms-processing

    This sample demonstrates how to use GPT-4 Vision to extract structured JSON data from PDF documents, such as invoices, using the Azure OpenAI Service.

    I hope this helps! If you have any more questions, feel free to ask

    0 comments No comments