Transparency note: Azure AI Search

What is a Transparency Note?

An AI system includes not only the technology, but also the people who will use it, the people who will be affected by it, and the environment in which it is deployed. Creating a system that is fit for its intended purpose requires an understanding of how the technology works, what its capabilities and limitations are, and how to achieve the best performance. Microsoft's Transparency Notes are intended to help you understand how our AI technology works, the choices system owners can make that influence system performance and behavior, and the importance of thinking about the whole system, including the technology, the people, and the environment. You can use Transparency Notes when developing or deploying your own system, or share them with the people who will use or be affected by your system.

Microsoft's Transparency Notes are part of a broader effort at Microsoft to put our AI Principles into practice. To find out more, see the Microsoft AI principles.

Introduction

Azure AI Search gives developers tools, APIs, and SDKs for building a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications. Search is foundational to any application that surfaces data to users. Common scenarios include catalog or document search, online retail stores, or data exploration over proprietary content.

Searchable data can be in the form of text or vectors and ingested as-is from a data source or enriched by using AI to improve the overall search experience. Developers can convert data to vectors by using external machine learning models they call, and indexers can optionally include skill sets that support a powerful suite of data enrichment capabilities by using several Azure AI Language services, such as Named Entity Recognition (NER) and personally identifiable information (PII) detection, and Azure AI Vision services, including optical character recognition (OCR) and image analysis.

See the following tabs for more information about how Azure AI Search improves the search experience by using Azure AI Services or other AI systems to better understand the intent, semantics, and implied structure of a customer's content.

AI enrichment is the application of machine learning models from Azure AI Services over content that is not easily searchable in its raw form. Through enrichment, analysis and inference are used to create searchable content and structure where none previously existed.

AI enrichment is an optional extension of the Azure AI Search indexer pipeline that connects to Azure AI Services in the same region as a customer's search service. An enrichment pipeline has the same core components as a typical indexer (indexer, data source, index), plus a skill set that specifies the atomic enrichment steps. A skill set can be assembled by using built-in skills based on the Azure AI Services APIs, such as Computer Vision and Language Service, or custom skills that run external code that you provide.

Capabilities

System behavior

Several built-in skills for AI enrichment in Azure AI Search take advantage of Azure AI Services. See the Transparency Notes for each built-in skill linked below for considerations when choosing to use a skill:

Please see the documentation for each skill to learn more about their respective capabilities, limitations, performance, evaluations, and methods for integration and responsible use. Note that using these skills in combination may lead to compounding effects (for example, errors introduced when using OCR will carry through when using key phrase extraction).

Use cases

Example use cases

Because Azure AI Search is a full text search solution, the purpose of AI enrichment is to improve the search utility of unstructured content. Here are some examples of content enrichment scenarios supported by the built-in skills:

  • Translation and language detection enable multilingual search.
  • Entity recognition extracts people , places , and other entities from large chunks of text.
  • Key phrase extraction identifies and then outputs important terms.
  • OCR recognizes printed and handwritten text in binary files.
  • Image analysis describes image content and outputs the descriptions as searchable text fields.
  • Integrated vectorization is a preview feature that calls the Azure OpenAI embeddings model to vectorize data and store embeddings in Azure Search for similarity search.

Limitations

AI enrichment in Azure AI Search uses the indexer and data source features of the service to call Azure AI Services to perform the content enrichment. Limitations of the indexers and data sources used in this process will apply. Review the indexer and data source documentation for more information about these related limitations. The limitations of each Azure AI Service used by the AI enrichment pipeline in Azure AI Search will also apply. See the transparency notes for each service for more information about these limitations.

Learn more about responsible AI