您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

Azure 认知搜索是什么?What is Azure Cognitive Search?

Azure 认知搜索(以前称为“Azure 搜索”)是一种云搜索服务,它为开发人员提供 API 和工具,以便基于 Web、移动和企业应用程序中的专用异类内容构建丰富的搜索体验。Azure Cognitive Search (formerly known as "Azure Search") is a cloud search service that gives developers APIs and tools for building a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.

搜索是任何向用户展示内容的应用的基础,其常见方案包括目录或文档搜索、电子商务站点搜索或数据科学知识挖掘等。Search is foundational to any app that surfaces content to users, with common scenarios including catalog or document search, e-commerce site search, or knowledge mining for data science. 认知搜索的 API 和体系结构简化了向任何解决方案添加复杂信息检索的任务。The APIs and architecture of Cognitive Search simplify the task of adding sophisticated information retrieval to any solution.

搜索服务具有以下组件:A search service has the following components:

  • 用于全文搜索的搜索引擎Search engine for full text search
  • 用户所拥有且已编制索引的内容的持久存储Persistent storage of user-owned indexed content
  • 用于编制索引和查询内容的 APIAPIs for indexing and querying content
  • 可选的基于 AI 的扩充,可从图像、原始非结构化文本、应用程序文件创建可搜索的内容Optional AI-based enrichments, creating searchable content out of images, raw unstructured text, application files
  • 可选择与其他 Azure 服务集成以实现数据、机器学习/AI、监视和安全性Optional integration with other Azure services for data, machine learning/AI, monitoring, and security
  • 可选择实现语义搜索(预览版)以提高关联性Optional implementation of semantic search (preview) for improved relevance

从体系结构方面来讲,搜索服务位于外部数据存储(包含未编入索引的数据)与客户端应用(向搜索索引发送查询请求并处理响应)之间。Architecturally, a search service sits in between the external data stores that contain your un-indexed data, and your client app that sends query requests to a search index and handles the response.

Azure 认知搜索体系结构Azure Cognitive Search architecture

表面上,搜索可以以“索引器”和“技能组”的形式与其他 Azure 服务集成,索引器可以自动从 Azure 数据源引入/检索数据,而技能组可以引入图像和文本分析等认知服务中的可消耗 AI,或者引入你在 Azure 机器学习中创建的或在 Azure Functions 内包装的自定义 AI 。Externally, search can integrate with other Azure services in the form of indexers that automate data ingestion/retrieval from Azure data sources, and skillsets that incorporate consumable AI from Cognitive Services, such as image and text analysis, or custom AI that you create in Azure Machine Learning or wrap inside Azure Functions.

在搜索服务中Inside a search service

在搜索服务本身,两个主要工作负荷是索引编制和查询 。On the search service itself, the two primary workloads are indexing and querying.

  • 编制索引是向搜索服务加载内容并使其可供搜索的引入过程。Indexing is an intake process that loads content into to your search service and makes it searchable. 在内部,将入站文本处理到令牌中,并将其存储在逆选索引中,以便快速扫描。Internally, inbound text is processed into tokens and stored in inverted indexes for fast scans. 你可以上传 JSON 文档格式的任何内容。You can upload any text that is in the form of JSON documents.

    此外,如果内容包含多种类型的文件,则可以选择通过认知技能添加 AI 扩充。Additionally, if your content includes mixed files, you have the option of adding AI enrichment through cognitive skills. AI 扩充可以提取嵌入在应用程序文件中的文本,还可以通过分析内容从非文本文件中推断文本和结构。AI enrichment can extract text embedded in application files, and also infer text and structure from non-text files by analyzing the content.

    提供这种分析的技能是 Microsoft 提供的预定义技能,或你创建的自定义技能。The skills providing the analysis are predefined ones from Microsoft, or custom skills that you create. 后续的分析和转换可能会导致生成以前不存在的新信息和结构,为许多搜索和知识挖掘方案提供高实用性。The subsequent analysis and transformations can result in new information and structures that did not previously exist, providing high utility for many search and knowledge mining scenarios.

  • 当客户端应用将查询请求发送到搜索服务并处理响应时,索引中填充了可搜索的文本后,就会发生查询Querying can happen once an index is populated with searchable text, when your client app sends query requests to a search service and handles responses. 所有查询执行都基于你在服务中创建、拥有和存储的搜索索引。All query execution is over a search index that you create, own, and store in your service. 在客户端应用中,搜索体验是使用 Azure 认知搜索中的 API 定义的,可能包括相关性调整、自动完成、同义词匹配、模糊匹配、模式匹配、筛选和排序。In your client app, the search experience is defined using APIs from Azure Cognitive Search, and can include relevance tuning, autocomplete, synonym matching, fuzzy matching, pattern matching, filter, and sort.

功能通过简单的 REST API.NET SDK 公开,消除了信息检索固有的复杂性。Functionality is exposed through a simple REST API or .NET SDK that masks the inherent complexity of information retrieval. 你还可以使用 Azure 门户,通过用于原型制作以及查询索引和技能组的工具进行服务管理和内容管理。You can also use the Azure portal for service administration and content management, with tools for prototyping and querying your indexes and skillsets. 因为服务在云中运行,所以基础结构和可用性由 Microsoft 管理。Because the service runs in the cloud, infrastructure and availability are managed by Microsoft.

Azure 认知搜索非常适合以下应用方案:Azure Cognitive Search is well suited for the following application scenarios:

  • 将异构内容整合成专用的用户定义的搜索索引。Consolidate heterogeneous content into a private, user-defined search index.

  • 轻松实现搜索相关的功能:相关性优化、分面导航、筛选器(包括地理空间搜索)、同义词映射和自动完成。Easily implement search-related features: relevance tuning, faceted navigation, filters (including geo-spatial search), synonym mapping, and autocomplete.

  • 将 Azure Blob 存储或 Cosmos DB 中存储的大型无差别文本、图像文件或应用程序文件转换为可搜索的 JSON 文档。Transform large undifferentiated text or image files, or application files stored in Azure Blob storage or Cosmos DB, into searchable JSON documents. 这是通过添加外部处理的认知技能在编制索引期间实现的。This is achieved during index through cognitive skills that add external processing.

  • 添加语言或自定义文本分析。Add linguistic or custom text analysis. 如果你使用非英语内容,Azure 认知搜索支持 Lucene 分析器和 Microsoft 的自然语言处理器。If you have non-English content, Azure Cognitive Search supports both Lucene analyzers and Microsoft's natural language processors. 还可以配置分析器以实现原始内容的专业处理,例如筛选出标注字符,或识别并保留字符串中的模式。You can also configure analyzers to achieve specialized processing of raw content, such as filtering out diacritics, or recognizing and preserving patterns in strings.

有关特定功能的详细信息,请参阅 Azure 认知搜索的 功能For more information about specific functionality, see Features of Azure Cognitive Search

如何开始使用How to get started

可以通过以下四个步骤来实现核心搜索功能的端到端探索:An end-to-end exploration of core search features can be achieved in four steps:

  1. 在共享的免费层或付费层(其中的资源专供你的服务使用)上创建搜索服务Create a search service at the shared Free tier or a billable tier for dedicated resources used only by your service. 所有快速入门和教程都可以通过共享服务完成。All quickstarts and tutorials can be completed on a shared service.

  2. 使用门户、REST API.NET SDK 或其他 SDK创建搜索索引Create a search index using the portal, REST API, .NET SDK, or another SDK. 索引架构决定了可搜索内容的结构。The index schema defines the structure of searchable content.

  3. 使用“推送”模型上传内容,以从任意源推送 JSON 文档,如果数据源在 Azure 上,则使用“拉取”模型(索引器)Upload content using the "push" model to push JSON documents from any source, or use the "pull" model (indexers) if your source data is on Azure.

  4. 使用门户 REST API.NET SDK 或其他 SDK 中的搜索资源管理器查询索引Query an index using Search explorer in the portal, REST API, .NET SDK, or another SDK.

为了进行初步探索,请从导入数据向导和内置 Azure 数据源开始,在几分钟内创建、加载和查询索引。For initial exploration, start with the Import data wizard and a built-in Azure data source to create, load, and query an index in minutes.

若要获取有关复杂或自定义解决方案的帮助,请与在认知搜索技术方面具有深厚专业知识的合作伙伴联系。For help with complex or custom solutions, contact a partner with deep expertise in Cognitive Search technology.

比较搜索选项Compare search options

客户常常询问 Azure 认知搜索与其他搜索相关解决方案有何不同。Customers often ask how Azure Cognitive Search compares with other search-related solutions. 下表总结主要区别。The following table summarizes key differences.

比较对象Compared to 主要区别Key differences
Microsoft SearchMicrosoft Search Microsoft 搜索适用于需要在 SharePoint 中查询内容的经过 Microsoft 365 身份验证的用户。Microsoft Search is for Microsoft 365 authenticated users who need to query over content in SharePoint. 它作为现成可用的搜索体验提供,由管理员进行启用和配置,能够通过连接器接受来自 Microsoft 和其他来源的外部内容。It's offered as a ready-to-use search experience, enabled and configured by administrators, with the ability to accept external content through connectors from Microsoft and other sources. 如果这与你的场景一致,则 Microsoft 365 的 Microsoft 搜索是一个值得探索的诱人选项。If this describes your scenario, then Microsoft Search with Microsoft 365 is an attractive option to explore.

相对地,Azure 认知搜索对你定义的索引执行查询,填充你拥有的数据和文档(常常来自多个不同的源)。In contrast, Azure Cognitive Search executes queries over an index that you define, populated with data and documents you own, often from diverse sources. Azure 认知搜索具有通过索引器爬取一些 Azure 数据源的功能,但你也可将符合你的索引架构的所有 JSON 文档推送到单个统一的可搜索资源。Azure Cognitive Search has crawler capabilities for some Azure data sources through indexers, but you can push any JSON document that conforms to your index schema into a single, consolidated searchable resource. 你还可自定义索引管道,将机器学习和词法分析器纳入其中。You can also customize the indexing pipeline to include machine learning and lexical analyzers. 由于认知搜索被构建为更大型的解决方案中的一个插件组件,因此你可通过任意平台在几乎任意应用中集成搜索功能。Because Cognitive Search is built to be a plug-in component in larger solutions, you can integrate search into almost any app, on any platform.
必应Bing 必应 Web 搜索 API 在 Bing.com 上搜索索引以匹配提交的搜索词。Bing Web Search API searches the indexes on Bing.com for matching terms you submit. 索引从 HTML、XML 和公共网站上的其他 Web 内容生成。Indexes are built from HTML, XML, and other web content on public sites. 必应自定义搜索构建于同一基础之上,针对 Web 内容类型提供相同的爬网技术,范围覆盖单个网站。Built on the same foundation, Bing Custom Search offers the same crawler technology for web content types, scoped to individual web sites.

在认知搜索中,可定义并填充索引。In Cognitive Search, you can define and populate the index. 可使用索引器在 Azure 数据源上爬取数据,或者将所有与索引一致的 JSON 文档推送到搜索服务。You can use indexers to crawl data on Azure data sources, or push any index-conforming JSON document to your search service.
数据库搜索Database search 许多数据库平台都包含内置的搜索体验。Many database platforms include a built-in search experience. SQL Server 具有全文搜索SQL Server has full text search. Cosmos DB 及类似技术具有可查询的索引。Cosmos DB and similar technologies have queryable indexes. 在评估结合使用搜索和存储的产品时,确定要采用哪种方式可能颇具挑战性。When evaluating products that combine search and storage, it can be challenging to determine which way to go. 许多解决方案同时使用两种:使用 DBMS 进行存储,使用 Azure 认知搜索获取专业搜索功能。Many solutions use both: DBMS for storage, and Azure Cognitive Search for specialized search features.

与 DBMS 搜索相比,Azure 认知搜索存储来自不同来源的内容,并提供专用文本处理功能,例如 56 种语言中的语言感知文本处理(词干化、词元化、词形式)。Compared to DBMS search, Azure Cognitive Search stores content from heterogeneous sources and offers specialized text processing features such as linguistic-aware text processing (stemming, lemmatization, word forms) in 56 languages. 它还支持拼写错误单词的自动更正、同义词建议评分控制Facet自定义词汇切分It also supports autocorrection of misspelled words, synonyms, suggestions, scoring controls, facets, and custom tokenization. Azure 认知搜索中的全文搜索引擎基于 Apache Lucene,它是信息检索方面的行业标准。The full text search engine in Azure Cognitive Search is built on Apache Lucene, an industry standard in information retrieval. 虽然 Azure 认知搜索以倒排索引的形式持久存储数据,但它不能替代真正的数据存储,建议不要在该容量中使用它。However, while Azure Cognitive Search persists data in the form of an inverted index, it is not a replacement for true data storage and we don't recommend using it in that capacity. 有关详细信息,请参阅此论坛帖子For more information, see this forum post.

资源利用是这个类别的另一个转折点。Resource utilization is another inflection point in this category. 索引和一些查询操作通常是计算密集型的。Indexing and some query operations are often computationally intensive. 将搜索从 DBMS 卸载到云中的专用解决方案可以节省用于事务处理的系统资源。Offloading search from the DBMS to a dedicated solution in the cloud preserves system resources for transaction processing. 此外,通过将搜索外部化,可以根据查询量轻松调整规模。Furthermore, by externalizing search, you can easily adjust scale to match query volume.
专用搜索解决方案Dedicated search solution 假设已决定使用全频谱功能进行专用搜索,则需要在本地解决方案或云服务之间进行最终的分类比较。Assuming you have decided on dedicated search with full spectrum functionality, a final categorical comparison is between on premises solutions or a cloud service. 许多搜索技术提供对索引和查询管道的控制、对更丰富查询和筛选语法的访问、对设置级别和相关性的控制以及自导智能搜索功能。Many search technologies offer controls over indexing and query pipelines, access to richer query and filtering syntax, control over rank and relevance, and features for self-directed and intelligent search.

如果想要获得一个开销和维护工作量极少且规模可调的统包解决方案,则云服务是适当的选择。A cloud service is the right choice if you want a turn-key solution with minimal overhead and maintenance, and adjustable scale.

在云的范式中,许多提供程序提供相当的基线功能,以及全文搜索、地理搜索,并且能够处理搜索输入中一定程度的模糊性。Within the cloud paradigm, several providers offer comparable baseline features, with full-text search, geo-search, and the ability to handle a certain level of ambiguity in search inputs. 通常,它是一项专用功能,或者是 API、工具以及用于确定最匹配项的管理功能的易化和总体简化。Typically, it's a specialized feature, or the ease and overall simplicity of APIs, tools, and management that determines the best fit.

在所有云提供程序中,对于主要依赖于信息检索搜索和内容导航的应用,Azure 认知搜索在处理 Azure 上的内容存储和数据库的全文搜索工作负荷方面最为强大。Among cloud providers, Azure Cognitive Search is strongest for full text search workloads over content stores and databases on Azure, for apps that rely primarily on search for both information retrieval and content navigation.

主要优势包括:Key strengths include:

  • 在索引层的 Azure 数据集成(爬网程序)Azure data integration (crawlers) at the indexing layer
  • Azure 专用链接集成,支持断网安全要求Azure Private Link integration to support off-internet security requirements
  • 与 AI 处理的集成,使不可搜索的内容类型变成可通过文本进行搜索。Integration with AI processing to make unsearchable content types text-searchable.
  • 语言分析和自定义分析,提供分析器,用于支持以 56 种语言进行可靠的全文搜索Linguistic and custom analysis, with analyzers for solid full text search in 56 languages
  • 关键功能:丰富的查询语言、相关性优化、分面、自动补全、同义词、异地搜索和结果组合。Critical features: rich query language, relevance tuning, faceting, autocomplete, synonyms, geo-search, and result composition.
  • Azure 可伸缩性、可靠性和世界一流的可用性Azure scale, reliability, and world-class availability

在我们的所有客户中,能够利用 Azure 认知搜索中最广泛功能的客户包括在线目录、业务线程序以及文档发现应用程序。Among our customers, those able to leverage the widest range of features in Azure Cognitive Search include online catalogs, line-of-business programs, and document discovery applications.

观看此视频Watch this video

在这个 15 分钟的视频中,项目经理 Luis Cabrera 介绍了 Azure 认知搜索。In this 15-minute video, program manager Luis Cabrera introduces Azure Cognitive Search.