您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

Azure 认知搜索中的查询类型和组合Query types and composition in Azure Cognitive Search

在 Azure 认知搜索中,查询是对往返操作的完整说明。In Azure Cognitive Search, a query is a full specification of a round-trip operation. 请求上的参数提供用于在索引中查找文档的匹配条件、要包括或排除的字段、传递给引擎的执行指令以及用于调整响应的指令。Parameters on the request provide match criteria for finding documents in an index, which fields to include or exclude, execution instructions passed to the engine, and directives for shaping the response. 未指定(search=*),查询将针对所有可搜索字段作为全文搜索操作运行,并以任意顺序返回未评分结果集。Unspecified (search=*), a query runs against all searchable fields as a full text search operation, returning an unscored result set in arbitrary order.

下面的示例是在REST API中构造的一个代表性查询。The following example is a representative query constructed in the REST API. 此示例的目标是旅馆演示程序索引,并包含通用参数。This example targets the hotels demo index and includes common parameters.

{
    "queryType": "simple" 
    "search": "+New York +restaurant",
    "searchFields": "Description, Address/City, Tags",
    "select": "HotelId, HotelName, Description, Rating, Address/City, Tags",
    "top": "10",
    "count": "true",
    "orderby": "Rating desc"
}
  • queryType 设置分析器,它是默认的简单查询分析器(对于全文搜索是最佳的),或者是用于高级查询构造(如正则表达式、邻近搜索、模糊和通配符搜索,只需对其进行命名。queryType sets the parser, which is either the default simple query parser (optimal for full text search), or the full Lucene query parser used for advanced query constructs like regular expressions, proximity search, fuzzy and wildcard search, to name a few.

  • search 提供匹配条件(通常是文本,但往往附带布尔运算符)。search provides the match criteria, usually text but often accompanied by boolean operators. 包含单个独立字词的查询称为字词查询。Single standalone terms are term queries. 由括在引号中的多个部分组成的查询称为关键短语查询。Quote-enclosed multi-part queries are key phrase queries. 搜索可以是未定义的(例如 search=* ),但搜索更有可能包含字词、短语和运算符,如以下示例中所示。Search can be undefined, as in search=*, but more likely consists of terms, phrases, and operators similar to what appears in the example.

  • searchFields 将查询执行限制为特定字段。searchFields constrains query execution to specific fields. 在索引架构中属性化为可搜索的任何字段都是此参数的候选项。Any field that is attributed as searchable in the index schema is a candidate for this parameter.

还可以通过查询中包含的参数来调整响应。Responses are also shaped by the parameters you include in the query. 在本示例中,结果集包含 select 语句中列出的字段。In the example, the result set consists of fields listed in the select statement. 只能在 $select 语句中使用标记为可检索的字段。Only fields marked as retrievable can be used in a $select statement. 此外,在此查询中只会返回 top 10 个命中,而 count 会告诉您总的文档数,这可能比返回的文档多。Additionally, only the top 10 hits are returned in this query, while count tells you how many documents match overall, which can be more than what are returned. 在此查询中,按级别降序对行进行排序。In this query, rows are sorted by Rating in descending order.

在 Azure 认知搜索中,查询执行始终针对一个索引,并使用在请求中提供的 api 密钥进行身份验证。In Azure Cognitive Search, query execution is always against one index, authenticated using an api-key provided in the request. 在 REST 中,两者均在请求标头中提供。In REST, both are provided in request headers.

如何运行此查询How to run this query

若要执行此查询,请使用 "搜索资源管理器" 和 "宾馆演示索引"To execute this query, use Search explorer and the hotels demo index.

可将此查询字符串粘贴到浏览器的搜索栏中:search=+"New York" +restaurant&searchFields=Description, Address/City, Tags&$select=HotelId, HotelName, Description, Rating, Address/City, Tags&$top=10&$orderby=Rating desc&$count=trueYou can paste this query string into the explorer's search bar: search=+"New York" +restaurant&searchFields=Description, Address/City, Tags&$select=HotelId, HotelName, Description, Rating, Address/City, Tags&$top=10&$orderby=Rating desc&$count=true

索引如何启用查询操作How query operations are enabled by the index

索引设计和查询设计在 Azure 认知搜索中紧密耦合。Index design and query design are tightly coupled in Azure Cognitive Search. 需要提前知道的一个重要事实是,包含每个字段中属性的索引架构确定了可以生成的查询类型。An essential fact to know up front is that the index schema, with attributes on each field, determines the kind of query you can build.

字段中的索引属性设置允许的操作 - 字段在索引中是否可搜索、在结果中是否可检索、是否可排序、是否可筛选,等等。Index attributes on a field set the allowed operations - whether a field is searchable in the index, retrievable in results, sortable, filterable, and so forth. 在示例查询字符串中,"$orderby": "Rating" 仅适用于在索引架构中标记为可排序的 "分级" 字段。In the example query string, "$orderby": "Rating" only works because the Rating field is marked as sortable in the index schema.

宾馆示例的索引定义Index definition for the hotel sample

以上屏幕截图是旅馆示例的索引属性的部分列表。The above screenshot is a partial list of index attributes for the hotels sample. 可在门户中查看整个索引架构。You can view the entire index schema in the portal. 有关索引属性的详细信息,请参阅创建索引 REST APIFor more information about index attributes, see Create Index REST API.

备注

某些查询功能在索引范围启用,而不是按字段启用。Some query functionality is enabled index-wide rather than on a per-field basis. 这些功能包括:同义词映射自定义分析器建议器构造(对于自动完成和建议的查询)评分逻辑以便对结果进行排名These capabilities include: synonym maps, custom analyzers, suggester constructs (for autocomplete and suggested queries), scoring logic for ranking results.

查询请求的元素Elements of a query request

查询始终指向单个索引。Queries are always directed at a single index. 不能联接索引或者创建自定义或临时数据结构作为查询目标。You cannot join indexes or create custom or temporary data structures as a query target.

查询请求上的必需元素包括以下内容:Required elements on a query request include the following components:

  • 以 URL 表示的、包含固定和用户定义组件的服务终结点与索引文档集合: https://<your-service-name>.search.windows.net/indexes/<your-index-name>/docsService endpoint and index documents collection, expressed as a URL containing fixed and user-defined components: https://<your-service-name>.search.windows.net/indexes/<your-index-name>/docs
  • 之所以需要 api-version (仅适用于 REST),是因为始终有多个可用的 API 版本。api-version (REST only) is necessary because more than one version of the API is available at all times.
  • api-key :查询或管理 API 密钥,用于对服务请求进行身份验证。api-key, either a query or admin api-key, authenticates the request to your service.
  • queryType :简单或完整类型,如果想使用内置的默认简单语法,则可以省略此元素。queryType, either simple or full, which can be omitted if you are using the built-in default simple syntax.
  • searchfilter 提供匹配条件,如果想要执行空搜索,则可以不指定此元素。search or filter provides the match criteria, which can be unspecified if you want to perform an empty search. 这两种查询类型都是作为简单分析器讨论的,但即使是高级查询,也需要通过搜索参数来传递复杂的查询表达式。Both query types are discussed in terms of the simple parser, but even advanced queries require the search parameter for passing complex query expressions.

所有其他搜索参数都为可选参数。All other search parameters are optional. 有关属性的完整列表,请参阅创建索引 (REST)For the full list of attributes, see Create Index (REST). 有关如何在处理过程中使用参数的详细说明,请参阅Azure 认知搜索中的全文搜索的工作原理For a closer look at how parameters are used during processing, see How full-text search works in Azure Cognitive Search.

选择 Api 和工具Choose APIs and tools

下表列出用于提交查询的 API 和基于工具的方法。The following table lists the APIs and tool-based approaches for submitting queries.

方法Methodology 描述Description
搜索浏览器(门户)Search explorer (portal) 提供搜索栏,以及索引和 API 版本选项。Provides a search bar and options for index and api-version selections. 结果会以 JSON 文档的形式返回。Results are returned as JSON documents. 建议用于浏览、测试和验证。Recommended for exploration, testing, and validation.
了解详细信息。Learn more.
Postman 或其他 REST 工具Postman or other REST tools Web 测试工具是用公式表示 REST 调用的极佳选择。Web testing tools are an excellent choice for formulating REST calls. REST API 支持 Azure 认知搜索中的每个可能操作。The REST API supports every possible operation in Azure Cognitive Search. 本文介绍如何设置 HTTP 请求标头和正文,以便向 Azure 认知搜索发送请求。In this article, learn how to set up an HTTP request header and body for sending requests to Azure Cognitive Search.
SearchIndexClient (.NET)SearchIndexClient (.NET) 可用于查询 Azure 认知搜索索引的客户端。Client that can be used to query an Azure Cognitive Search index.
了解详细信息。Learn more.
搜索文档 (REST API)Search Documents (REST API) 索引上的 GET 或 POST 方法,使用查询参数进行其他输入。GET or POST methods on an index, using query parameters for additional input.

选择一个分析器:简单 | 完整Choose a parser: simple | full

Azure 认知搜索位于 Apache Lucene 的顶层,可让你选择两个查询分析器来处理典型查询和专用查询。Azure Cognitive Search sits on top of Apache Lucene and gives you a choice between two query parsers for handling typical and specialized queries. 使用简单分析器的请求是通过简单查询语法构建的。由于在自由格式文本查询中具有速度和效率优势,这种语法已选作默认语法。Requests using the simple parser are formulated using the simple query syntax, selected as the default for its speed and effectiveness in free form text queries. 此语法支持多种常用的搜索运算符,包括 AND、OR、NOT、短语、后缀和优先运算符。This syntax supports a number of common search operators including the AND, OR, NOT, phrase, suffix, and precedence operators.

在将 queryType=full 添加到请求时所启用的完整 Lucene 查询语法公开作为 Apache Lucene 的一部分开发的、已被广泛采用的且富有表达能力的查询语言。The full Lucene query syntax, enabled when you add queryType=full to the request, exposes the widely adopted and expressive query language developed as part of Apache Lucene. 完整语法扩展了简单语法。Full syntax extends the simple syntax. 为简单语法编写的任何查询在完整 Lucene 分析器下运行。Any query you write for the simple syntax runs under the full Lucene parser.

以下示例演示了一个要点:采用不同 queryType 设置的同一个查询会产生不同的结果。The following examples illustrate the point: same query, but with different queryType settings, yield different results. 在第一个查询中,将 historic 之后的 ^3 视为搜索词的一部分。In the first query, the ^3 after historic is treated as part of the search term. 此查询的排名靠前的结果是 "Marquis Plaza & 套件",其说明中包含海洋The top-ranked result for this query is "Marquis Plaza & Suites", which has ocean in its description

queryType=simple&search=ocean historic^3&searchFields=Description, Tags&$select=HotelId, HotelName, Tags, Description&$count=true

使用完整的 Lucene 分析器的同一查询会将 ^3 解释为现场术语增强程序。The same query using the full Lucene parser interprets ^3 as an in-field term booster. 切换分析器会更改排名,结果中包含一项历史活动,并将其移到顶部。Switching parsers changes the rank, with results containing the term historic moving to the top.

queryType=full&search=ocean historic^3&searchFields=Description, Tags&$select=HotelId, HotelName, Tags, Description&$count=true

查询类型Types of queries

Azure 认知搜索支持范围广泛的查询类型。Azure Cognitive Search supports a broad range of query types.

查询类型Query type 使用情况Usage 示例和详细信息Examples and more information
自由格式文本搜索Free form text search 搜索参数和任一分析器Search parameter and either parser 全文搜索在索引中所有可搜索字段内扫描一个或多个字词,其工作原理与 Google 或必应等搜索引擎相同。Full text search scans for one or more terms in all searchable fields in your index, and works the way you would expect a search engine like Google or Bing to work. 简介中的示例属于全文搜索。The example in the introduction is full text search.

全文搜索默认使用标准 Lucene 分析器来执行文本分析,以将字词设为小写,并删除“the”等干扰词。Full text search undergoes text analysis using the standard Lucene analyzer (by default) to lower-case all terms, remove stop words like "the". 可将默认设置替代为可以修改文本分析的非英语分析器专用的与语言无关的分析器You can override the default with non-English analyzers or specialized language-agnostic analyzers that modify text analysis. 例如,将整个字段内容视为单个标记的关键字An example is keyword that treats the entire contents of a field as a single token. 此分析器可用于邮政编码、ID 和某些产品名称等数据。This is useful for data like zip codes, IDs, and some product names.
筛选的搜索Filtered search OData 筛选表达式和任一分析器OData filter expression and either parser 筛选器查询对索引中的所有可筛选字段计算布尔表达式。Filter queries evaluate a boolean expression over all filterable fields in an index. 与搜索不同,筛选器查询与字段内容完全匹配,包括字符串字段的大小写区分。Unlike search, a filter query matches the exact contents of a field, including case-sensitivity on string fields. 另一项差别在于,筛选器查询以 OData 语法表示。Another difference is that filter queries are expressed in OData syntax.
筛选表达式示例Filter expression example
地理搜索Geo-search 字段中的 Edm.GeographyPoint 类型、筛选表达式和任一分析器Edm.GeographyPoint type on the field, filter expression, and either parser 存储在字段中的具有 Edm.GeographyPoint 的坐标用于“附近查找”或基于地图的搜索控件。Coordinates stored in a field having an Edm.GeographyPoint are used for "find near me" or map-based search controls.
地理搜索示例Geo-search example
范围搜索Range search 筛选表达式和简单分析器filter expression and simple parser 在 Azure 认知搜索中,使用 filter 参数生成范围查询。In Azure Cognitive Search, range queries are built using the filter parameter.
范围筛选器示例Range filter example
现场搜索Fielded search 搜索参数和完整分析器Search parameter and Full parser 针对单个字段生成复合查询表达式。Build a composite query expression targeting a single field.
现场搜索示例Fielded search example
模糊搜索fuzzy search 搜索参数和完整分析器Search parameter and Full parser 匹配具有类似构造或拼写方式的字词。Matches on terms having a similar construction or spelling.
模糊搜索示例Fuzzy search example
邻近搜索proximity search 搜索参数和完整分析器Search parameter and Full parser 查找在文档中相互靠近的字词。Finds terms that are near each other in a document.
邻近搜索示例Proximity search example
术语提升term boosting 搜索参数和完整分析器Search parameter and Full parser 如果某个文档包含提升的字词(相对于其他未提升的字词),则提高其排名。Ranks a document higher if it contains the boosted term, relative to others that don't.
字词提升示例Term boosting example
正则表达式搜索regular expression search 搜索参数和完整分析器Search parameter and Full parser 基于正则表达式的内容进行匹配。Matches based on the contents of a regular expression.
正则表达式示例Regular expression example
通配符或前缀搜索wildcard or prefix search 搜索参数和完整分析器Search parameter and Full parser 基于前缀和波浪符 (~) 或单个字符 (?) 进行匹配。Matches based on a prefix and tilde (~) or single character (?).
通配符搜索示例Wildcard search example

管理搜索结果Manage search results

查询结果会流式处理为 REST API 中的 JSON 文档,但如果使用 .NET API,则会内置序列化功能。Query results are streamed as JSON documents in the REST API, although if you use .NET APIs, serialization is built in. 可通过在查询中设置参数并为响应选择特定字段来调整结果。You can shape results by setting parameters on the query, selecting specific fields for the response.

可通过以下方式使用查询上的参数来调整结果集的结构:Parameters on the query can be used to structure the result set in the following ways:

  • 对结果中的文档数量(默认为 50 个)进行限制或分批Limiting or batching the number of documents in the results (50 by default)
  • 选择结果中要包含的字段Selecting fields to include in the results
  • 设置排列顺序Setting a sort order
  • 添加突出显示效果,以便在搜索结果正文中清楚看到匹配的搜索词Adding hit highlights to draw attention to matching terms in the body of the search results

意外结果提示Tips for unexpected results

有时可能会出现预料外的结果内容(而不是结构)。Occasionally, the substance and not the structure of results are unexpected. 如果查询结果并不是预期内容,可以尝试对查询进行以下修改,然后查看结果是否改进:When query outcomes are not what you expect to see, you can try these query modifications to see if results improve:

  • searchMode=any (默认)更改为 searchMode=all 可获取符合所有条件而不是某个条件的匹配项。Change searchMode=any (default) to searchMode=all to require matches on all criteria instead of any of the criteria. 在查询包含布尔运算符时更应如此。This is especially true when boolean operators are included the query.

  • 如果需要进行文本或词法分析但查询类型排除了语言处理环节,请更改查询方法。Change the query technique if text or lexical analysis is necessary, but the query type precludes linguistic processing. 在全文搜索中,文本或词法分析 autocorrects 了拼写错误、单复数单词形式,甚至是不规则的动词或名词。In full text search, text or lexical analysis autocorrects for spelling errors, singular-plural word forms, and even irregular verbs or nouns. 对于模糊搜索和通配符搜索等查询,其查询分析管道中不包含文本分析。For some queries such as fuzzy or wildcard search, text analysis is not part of the query parsing pipeline. 在某些情况下会采用正则表达式作为解决方法。For some scenarios, regular expressions have been used as a workaround.

分页结果Paging results

利用 Azure 认知搜索可以轻松地对搜索结果进行分页。Azure Cognitive Search makes it easy to implement paging of search results. 使用 topskip 参数可顺利地发出搜索请求,接收搜索结果总集,并通过其中易于管理的有序子集轻松完成效果良好的搜索 UI 操作。By using the top and skip parameters, you can smoothly issue search requests that allow you to receive the total set of search results in manageable, ordered subsets that easily enable good search UI practices. 接收较小的结果子集时,还可以在搜索结果总集中获得文档计数。When receiving these smaller subsets of results, you can also receive the count of documents in the total set of search results.

有关详细信息,请参阅如何在 Azure 认知搜索中对搜索结果进行分页。You can learn more about paging search results in the article How to page search results in Azure Cognitive Search.

对结果排序Ordering results

接收搜索查询的结果时,可以请求 Azure 认知搜索提供按特定字段中的值排序的结果。When receiving results for a search query, you can request that Azure Cognitive Search serves the results ordered by values in a specific field. 默认情况下,Azure 认知搜索会根据每个文档的搜索分数的排名对搜索结果进行排序,此排名是从TF-IDF派生的。By default, Azure Cognitive Search orders the search results based on the rank of each document's search score, which is derived from TF-IDF.

如果希望 Azure 认知搜索返回按搜索评分之外的值排序的结果,可以使用 orderby 搜索参数。If you want Azure Cognitive Search to return your results ordered by a value other than the search score, you can use the orderby search parameter. 对于地理空间值,可以指定 orderby 参数的值,使其包含字段名称及对 geo.distance() 函数的调用。You can specify the value of the orderby parameter to include field names and calls to the geo.distance() function for geospatial values. 每个表达式可后接 asc 来指示按升序请求结果,或后接 desc 来指示按降序请求结果。Each expression can be followed by asc to indicate that results are requested in ascending order, and desc to indicate that results are requested in descending order. 默认为升序。The default ranking ascending order.

突出显示Hit highlighting

在 Azure 认知搜索中,通过使用 highlighthighlightPreTaghighlightPostTag 参数,可以轻松地强调与搜索查询匹配的搜索结果的准确部分。In Azure Cognitive Search, emphasizing the exact portion of search results that match the search query is made easy by using the highlight, highlightPreTag, and highlightPostTag parameters. 您可以指定哪些可搜索字段应突出匹配文本,并指定要追加到 Azure 认知搜索返回的匹配文本的开头和结尾的精确字符串标记。You can specify which searchable fields should have their matched text emphasized as well as specifying the exact string tags to append to the start and end of the matched text that Azure Cognitive Search returns.

另请参阅See also