What is Azure Cognitive Search?
Azure Cognitive Search (formerly known as "Azure Search") is a cloud search service that gives developers APIs and tools for building a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.
A search service has the following components:
- Search engine for full text search
- Persistent storage of user-owned indexed content
- APIs for indexing and querying
- Optional AI-based enrichments, creating searchable content out of images, raw text, application files
- Optional integration with other Azure services for data, machine learning/AI, and security
Architecturally, a search service sits in between the external data stores that contain your un-indexed data, and your client app that sends query requests to a search index and handles the response.
Externally, search can integrate with other Azure services in the form of indexers that automate data ingestion/retrieval from Azure data sources, and skillsets that incorporate consumable AI from Cognitive Services, such as image and text analysis, or custom AI that you create in Azure Machine Learning or wrap inside Azure Functions.
Inside a search service
On the search service itself, the two primary workloads are indexing and querying.
Indexing is an intake process that loads content into to your search service and makes it searchable. Internally, inbound text is processed into tokens and stored in inverted indexes for fast scans. You can upload any text that is in the form of JSON documents.
Additionally, if your content includes mixed files, you have the option of adding AI enrichment through cognitive skills. AI enrichment can extract text embedded in application files, and also infer text and structure from non-text files by analyzing the content.
The skills providing the analysis are predefined ones from Microsoft, or custom skills that you create. The subsequent analysis and transformations can result in new information and structures that did not previously exist, providing high utility for many search and knowledge mining scenarios.
Querying can happen once an index is populated with searchable text, when your client app sends query requests to a search service and handles responses. All query execution is over a search index that you create, own, and store in your service. In your client app, the search experience is defined using APIs from Azure Cognitive Search, and can include relevance tuning, autocomplete, synonym matching, fuzzy matching, pattern matching, filter, and sort.
Functionality is exposed through a simple REST API or .NET SDK that masks the inherent complexity of information retrieval. You can also use the Azure portal for service administration and content management, with tools for prototyping and querying your indexes and skillsets. Because the service runs in the cloud, infrastructure and availability are managed by Microsoft.
Why use Cognitive Search
Azure Cognitive Search is well suited for the following application scenarios:
Consolidate heterogeneous content into a private, user-defined search index.
Easily implement search-related features: relevance tuning, faceted navigation, filters (including geo-spatial search), synonym mapping, and autocomplete.
Transform large undifferentiated text or image files, or application files stored in Azure Blob storage or Cosmos DB, into searchable JSON documents. This is achieved during index through cognitive skills that add external processing.
Add linguistic or custom text analysis. If you have non-English content, Azure Cognitive Search supports both Lucene analyzers and Microsoft's natural language processors. You can also configure analyzers to achieve specialized processing of raw content, such as filtering out diacritics, or recognizing and preserving patterns in strings.
For more information about specific functionality, see Features of Azure Cognitive Search
How to get started
An end-to-end exploration of core search features can be achieved in four steps:
Minimize steps by starting with the Import data wizard and an Azure data source to create, load, and query an index in minutes.
Compare search options
Customers often ask how Azure Cognitive Search compares with other search-related solutions. The following table summarizes key differences.
|Compared to||Key differences|
|Microsoft Search||Microsoft Search is for Microsoft 365 authenticated users who need to query over content in SharePoint. It's offered as a ready-to-use search experience, enabled and configured by administrators, with the ability to accept external content through connectors from Microsoft and other sources. If this describes your scenario, then Microsoft Search with Microsoft 365 is an attractive option to explore.
In contrast, Azure Cognitive Search executes queries over an index that you define, populated with data and documents you own, often from diverse sources. Azure Cognitive Search has crawler capabilities for some Azure data sources through indexers, but you can push any JSON document that conforms to your index schema into a single, consolidated searchable resource. You can also customize the indexing pipeline to include machine learning and lexical analyzers. Because Cognitive Search is built to be a plug-in component in larger solutions, you can integrate search into almost any app, on any platform.
|Bing||Bing Web Search API searches the indexes on Bing.com for matching terms you submit. Indexes are built from HTML, XML, and other web content on public sites. Built on the same foundation, Bing Custom Search offers the same crawler technology for web content types, scoped to individual web sites.
In Cognitive Search, you can define and populate the index. You can use indexers to crawl data on Azure data sources, or push any index-conforming JSON document to your search service.
|Database search||Many database platforms include a built-in search experience. SQL Server has full text search. Cosmos DB and similar technologies have queryable indexes. When evaluating products that combine search and storage, it can be challenging to determine which way to go. Many solutions use both: DBMS for storage, and Azure Cognitive Search for specialized search features.
Compared to DBMS search, Azure Cognitive Search stores content from heterogeneous sources and offers specialized text processing features such as linguistic-aware text processing (stemming, lemmatization, word forms) in 56 languages. It also supports autocorrection of misspelled words, synonyms, suggestions, scoring controls, facets, and custom tokenization. The full text search engine in Azure Cognitive Search is built on Apache Lucene, an industry standard in information retrieval. However, while Azure Cognitive Search persists data in the form of an inverted index, it is not a replacement for true data storage and we don't recommend using it in that capacity. For more information, see this forum post.
Resource utilization is another inflection point in this category. Indexing and some query operations are often computationally intensive. Offloading search from the DBMS to a dedicated solution in the cloud preserves system resources for transaction processing. Furthermore, by externalizing search, you can easily adjust scale to match query volume.
|Dedicated search solution||Assuming you have decided on dedicated search with full spectrum functionality, a final categorical comparison is between on premises solutions or a cloud service. Many search technologies offer controls over indexing and query pipelines, access to richer query and filtering syntax, control over rank and relevance, and features for self-directed and intelligent search.
A cloud service is the right choice if you want a turn-key solution with minimal overhead and maintenance, and adjustable scale.
Within the cloud paradigm, several providers offer comparable baseline features, with full-text search, geo-search, and the ability to handle a certain level of ambiguity in search inputs. Typically, it's a specialized feature, or the ease and overall simplicity of APIs, tools, and management that determines the best fit.
Among cloud providers, Azure Cognitive Search is strongest for full text search workloads over content stores and databases on Azure, for apps that rely primarily on search for both information retrieval and content navigation.
Key strengths include:
- Azure data integration (crawlers) at the indexing layer
- Azure Private Link integration to support off-internet security requirements
- Integration with AI processing to make unsearchable content types text-searchable.
- Linguistic and custom analysis, with analyzers for solid full text search in 56 languages
- Critical features: rich query language, relevance tuning, faceting, autocomplete, synonyms, geo-search, and result composition.
- Azure scale, reliability, and world-class availability
Among our customers, those able to leverage the widest range of features in Azure Cognitive Search include online catalogs, line-of-business programs, and document discovery applications.
Watch this video
In this 15-minute video, program manager Luis Cabrera introduces Azure Cognitive Search.