Features of Azure Cognitive Search
Azure Cognitive Search provides a full-text search engine, persistent storage of search indexes, integrated AI used during indexing to extract more text and structure, and APIs and tools. The following table summarizes features by category. For more information about how Cognitive Search compares with other search technologies, see What is Azure Cognitive Search?.
|Data sources||Search indexes can accept text from any source, provided it is submitted as a JSON document.
Indexers are a feature that automates data import from supported data sources to extract searchable content in primary data stores. Indexers handle JSON serialization for you. You can connect to various data sources, including Azure SQL Database, Azure Cosmos DB, or Azure Blob storage.
|Hierarchical and nested data structures||Complex types and collections allow you to model virtually any type of JSON structure within a search index. One-to-many and many-to-many cardinality can be expressed natively through collections, complex types, and collections of complex types.|
|Linguistic analysis||Analyzers are components used for text processing during indexing and search operations. By default, you can use the general-purpose Standard Lucene analyzer, or override the default with a language analyzer, a custom analyzer that you configure, or another predefined analyzer that produces tokens in the format you require.
Language analyzers from Lucene or Microsoft are used to intelligently handle language-specific linguistics including verb tenses, gender, irregular plural nouns (for example, 'mouse' vs. 'mice'), word de-compounding, word-breaking (for languages with no spaces), and more.
Custom lexical analyzers are used for complex query forms such as phonetic matching and regular expressions.
AI enrichment and knowledge mining
|AI processing during indexing||AI enrichment for image and text analysis can be applied to an indexing pipeline to extract text information from raw content. A few examples of built-in skills include optical character recognition (making scanned JPEGs searchable), entity recognition (identifying an organization, name, or location), and key phrase recognition. You can also code custom skills to attach to the pipeline. You can also integrate Azure Machine Learning authored skills.|
|Storing enriched content for analysis and consumption in non-search scenarios||Knowledge store is an alternative output of an indexing pipeline. Instead of sending tokenized terms to an index, you can send enriched documents created by the indexing pipeline to a knowledge store, resident in either Azure Blob Storage or Table Storage, depending on the configuration. Knowledge stores are created from AI-based indexing (skillsets). The purpose of a knowledge store is to support downstream analysis or processing. With new information and structures in a knowledge store, you can attach it to a machine learning process or connect from Power BI to explore the data.
|Cached content||Incremental enrichment (preview) limits processing to just the documents that are changed by specific edits to the pipeline, using cached content for the parts of the pipeline that do not change.|
Query and user experience
|Free-form text search||Full-text search is a primary use case for most search-based apps. Queries can be formulated using a supported syntax.
Simple query syntax provides logical operators, phrase search operators, suffix operators, precedence operators.
Full Lucene query syntax includes all operations in simple syntax, with extensions for fuzzy search, proximity search, term boosting, and regular expressions.
|Relevance||Simple scoring is a key benefit of Azure Cognitive Search. Scoring profiles are used to model relevance as a function of values in the documents themselves. For example, you might want newer products or discounted products to appear higher in the search results. You can also build scoring profiles using tags for personalized scoring based on customer search preferences you've tracked and stored separately.|
|Geo-search||Azure Cognitive Search processes, filters, and displays geographic locations. It enables users to explore data based on the proximity of a search result to a physical location. Watch this video or review this sample to learn more.|
|Filters and facets||Faceted navigation is enabled through a single query parameter. Azure Cognitive Search returns a faceted navigation structure you can use as the code behind a categories list, for self-directed filtering (for example, to filter catalog items by price-range or brand).
Filters can be used to incorporate faceted navigation into your application's UI, enhance query formulation, and filter based on user- or developer-specified criteria. Create filters using the OData syntax.
|User experience||Autocomplete can be enabled for type-ahead queries in a search bar.
Search suggestions also works off of partial text inputs in a search bar, but the results are actual documents in your index rather than query terms.
Synonyms associates equivalent terms that implicitly expand the scope of a query, without the user having to provide the alternate terms.
Hit highlighting applies text formatting to a matching keyword in search results. You can choose which fields return highlighted snippets.
Sorting is offered for multiple fields via the index schema and then toggled at query-time with a single search parameter.
Paging and throttling your search results is straightforward with the finely tuned control that Azure Cognitive Search offers over your search results.
|Data encryption||Microsoft-managed encryption-at-rest is built into the internal storage layer and is irrevocable.
Customer-managed encryption keys that you create and manage in Azure Key Vault can be used for supplemental encryption of indexes and synonym maps. For services created after August 1 2020, CMK encryption extends to data on temporary disks, for full double encryption of indexed content.
|Endpoint protection||IP rules for inbound firewall support allows you to set up IP ranges over which the search service will accept requests.
Create a private endpoint using Azure Private Link to force all requests through a virtual network.
|Outbound security (indexers)||Data access through private endpoints allows an indexer to connect to Azure resources that are protected through Azure Private Link.
Data access using a trusted identity means that connection strings to external data sources can omit user names and passwords. When an indexer connects to the data source, the resource allows the connection if the search service was previously registered as a trusted service.
|Tools for prototyping and inspection||Add index is an index designer in the portal that you can use to create a basic schema consisting of attributed fields and a few other settings. After saving the index, you can populate it using an SDK or the REST API to provide the data.
Import data wizard creates indexes, indexers, skillsets, and data source definitions. If your data exists in Azure, this wizard can save you significant time and effort, especially for proof-of-concept investigation and exploration.
Search explorer is used to test queries and refine scoring profiles.
Create demo app is used to generate an HTML page that can be used to test the search experience.
|Monitoring and diagnostics||Enable monitoring features to go beyond the metrics-at-a-glance that are always visible in the portal. Metrics on queries per second, latency, and throttling are captured and reported in portal pages with no additional configuration required.|
|REST||Service REST API is for data plane operations, including all operations related to indexing, queries, and AI enrichment. You can also use this client library to retrieve system information and statistics.
Management REST API is for service creation and clean up through Azure Resource Manager. You can also use this API to manage keys and provision a service.
|Azure SDK for .NET||Azure.Search.Documents is for data plane operations, including all operations related to indexing, queries, and AI enrichment. You can also use this client library to retrieve system information and statistics.
Microsoft.Azure.Management.Search is for service creation and clean up through Azure Resource Manager. You can also use this API to manage keys and provision a service.
|Azure SDK for Java||com.azure.search.documents is for data plane operations, including all operations related to indexing, queries, and AI enrichment. You can also use this client library to retrieve system information and statistics.
com.microsoft.azure.management.search is for service creation and clean up through Azure Resource Manager. You can also use this API to manage keys and provision a service.
|Azure SDK for Python||azure-search-documents is for data plane operations, including all operations related to indexing, queries, and AI enrichment. You can also use this client library to retrieve system information and statistics.
azure-mgmt-search is for service creation and clean up through Azure Resource Manager. You can also use this API to manage keys and provision a service.
azure/arm-search is for service creation and clean up through Azure Resource Manager. You can also use this API to manage keys and provision a service.