Can Document Intelligence extract information from images

Question

Is it possible to utilize Document Intelligence to extract data from images? Alternatively, does Image Analysis offer a similar function? Specifically, I would like to understand if Document Intelligence can comprehend technical information contained within charts, plots, or figures. For example, if I provide a research paper, can I extract relevant information including text, sections, tables, and image data to import into an application that uses a RAG interface? How can I extract image-related information?

Answer

Hello @Ritesh Panditi , Thanks for using Microsoft Q&A Platform.

I would suggest Pre-built Layout model with the latest 2024-02-29-preview version would be the best option as per your use case. The layout model Markdown-formatted output is LLM-friendly and ensures smooth integration into your workflows. Markdown is widely used for enabling semantic chunking in RAG (Retrieval-Augmented Generation). You can refer to the documentation for more details.

The "prebuilt-layout" model can detect figures and extract information such as their spatial locations, text spans, and related text elements, caption. User's image

Additionally, the "prebuilt-layout" model can identify two types of roles in a document layout: geometric roles (such as text, tables, figures, and selection marks) and logical roles (such as titles, headings, and footers).

The Azure AI Vision Image Analysis service can extract a wide variety of visual features from your images. For example, it can determine whether an image contains adult content, find specific brands or objects, or find human faces.

Please refer to these features and choose that fits best for your use case.

I hope this helps.

Regards,

Vasavi

-Please kindly accept the answer and vote 'yes' if you feel helpful to support the community, thanks.

Share via

Can Document Intelligence extract information from images

1 answer