AI enrichment with image and text processing

Azure App Service
Azure Blob Storage
Azure AI Search
Azure Functions

Solution ideas

This article is a solution idea. If you'd like us to expand the content with more information, such as potential use cases, alternative services, implementation considerations, or pricing guidance, let us know by providing GitHub feedback.

This article presents a solution that enriches text and image documents by using image processing, natural language processing, and custom skills to capture domain-specific data. Azure Cognitive Search with AI enrichment can help identify and explore relevant content at scale. This solution uses AI enrichment to extract meaning from the original complex, unstructured JFK Assassination Records (JFK Files) dataset.

Architecture

Diagram that shows Azure Cognitive Search architecture to convert unstructured into structured data.

Download a Visio file of this architecture.

Dataflow

The above diagram illustrates the process of passing the unstructured JFK Files dataset through the Azure Cognitive Search skills pipeline to produce structured, indexable data:

  1. Unstructured data in Azure Blob Storage, such as documents and images, ingest into Azure Cognitive Search.
  2. The document cracking step initiates the indexing process by extracting images and text from the data, followed by content enrichment. The enrichment steps that occur in this process depend on the data and type of skills selected.
  3. Built-in skills based on the Computer Vision and Language Service APIs enable AI enrichments including image optical character recognition (OCR), image analysis, text translation, entity recognition, and full-text search.
  4. Custom skills support scenarios that require more complex AI models or services. Examples include Forms Recognizer, Azure Machine Learning models, and Azure Functions.
  5. Following the enrichment process, the indexer saves the outputs into a search index that contains the enriched and indexed documents. Full-text search and other query forms can use this index.
  6. The enriched documents can also project into a knowledge store, which downstream apps like knowledge mining or data science can use.
  7. Queries access the enriched content in the search index. The index supports custom analyzers, fuzzy search queries, filters, and a scoring profile to tune search relevance.
  8. Any application that connects to Blob Storage or to Azure Table Storage can access the knowledge store.

Components

Azure Cognitive Search works with other Azure components to provide this solution.

Azure Cognitive Search indexes the content and powers the user experience in this solution. Azure Cognitive Search can apply pre-built cognitive skills to the content, and the extensibility mechanism can add custom skills for specific enrichment transformations.

Azure Computer Vision

Azure Computer Vision uses text recognition to extract and recognize text information from images. The Read API uses the latest OCR recognition models, and is optimized for large, text-heavy documents and noisy images.

The legacy OCR API isn't optimized for large documents, but supports more languages. OCR results can vary depending on scan and image quality. The current solution idea uses OCR to produce data in the hOCR format.

Azure Cognitive Service for Language

Azure Cognitive Service for Language extracts text information from unstructured documents by using text analytics capabilities like Named Entity Recognition (NER), key phrase extraction, and full-text search.

Azure Storage

Azure Blob Storage is REST-based object storage for data that you can access from anywhere in the world via HTTPS. You can use Blob Storage to expose data publicly to the world or to store application data privately. Blob Storage is ideal for large amounts of unstructured data like text or graphics.

Azure Table Storage stores highly available, scalable, structured or semi-structured NoSQL data in the cloud.

Azure Functions

Azure Functions is a serverless compute service that lets you run small pieces of event-triggered code without having to explicitly provision or manage infrastructure. This solution uses an Azure Functions method to apply the CIA Cryptonyms list to the JFK Assassination Records as a custom skill.

Azure App Service

This solution idea also builds a standalone web app in Azure App Service to test, demonstrate, search the index, and explore connections in the enriched and indexed documents.

Scenario details

Large, unstructured datasets can include typewritten and handwritten notes, photos and diagrams, and other unstructured data that standard search solutions can't parse. The JFK Assassination Records contain over 34,000 pages of documents about the CIA investigation of the 1963 JFK assassination.

The JFK Files sample project and online demo showcase a particular Azure Cognitive Search use case. This solution idea isn't intended to be a framework or scalable architecture for all scenarios, but to provide a general guideline and example. The code project and demo create a public website and publicly readable storage container for extracted images, so you shouldn't use this solution with non-public data.

AI enrichment in Azure Cognitive Search can extract and enhance searchable, indexable text from images, blobs, and other unstructured data sources like the JFK Files. AI enrichment uses pre-trained machine learning skill sets from the Cognitive Services Computer Vision and Cognitive Service for Language APIs. You can also create and attach custom skills to add special processing for domain-specific data like CIA Cryptonyms. Azure Cognitive Search can then index and search that context.

The Azure Cognitive Search skills in this solution fall into the following categories:

  • Image processing. Built-in text extraction and image analysis skills include object and face detection, tag and caption generation, and celebrity and landmark identification. These skills create text representations of image content, which are searchable by using the query capabilities of Azure Cognitive Search. Document cracking is the process of extracting or creating text content from non-text sources.

  • Natural language processing. Built-in skills like entity recognition, language detection, and key phrase extraction map unstructured text to searchable and filterable fields in an index.

  • Custom skills extend Azure Cognitive Search to apply specific enrichment transformations to content. You specify the interface for a custom skill through the Custom Web API skill.

Potential use cases

  • Increase the value and utility of unstructured text and image content in search and data science apps.
  • Use custom skills to integrate open-source, third-party, or first-party code into indexing pipelines.
  • Make scanned JPG, PNG, or bitmap documents full-text searchable.
  • Produce better outcomes than standard PDF text extraction for PDFs with combined image and text. Some scanned and native PDF formats might not parse correctly in Azure Cognitive Search.
  • Create new information from inherently meaningful raw content or context that's hidden in larger unstructured or semi-structured documents.

Contributors

This article is maintained by Microsoft. It was originally written by the following contributor.

Principal author:

To see non-public LinkedIn profiles, sign in to LinkedIn.

Next steps

Learn more about this solution:

Read product documentation:

Try the learning path:

See the related architectures and guidance: