您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

Lucene Azure 认知搜索中的查询语法Lucene query syntax in Azure Cognitive Search

你可以基于用于专用查询窗体的丰富Lucene 查询分析器语法针对 Azure 认知搜索编写查询:通配符、模糊搜索、邻近搜索、正则表达式是几个示例。You can write queries against Azure Cognitive Search based on the rich Lucene Query Parser syntax for specialized query forms: wildcard, fuzzy search, proximity search, regular expressions are a few examples. 很多 Lucene 查询分析器语法在 azure 认知搜索中都是完整的,但范围搜索例外,在 azure 认知搜索通过 $filter 表达式构造。Much of the Lucene Query Parser syntax is implemented intact in Azure Cognitive Search, with the exception of range searches which are constructed in Azure Cognitive Search through $filter expressions.

如何调用完整分析How to invoke full parsing

设置 queryType 搜索参数来指定要使用的分析。Set the queryType search parameter to specify which parser to use. 有效值包括 simple|full,其中默认值为 simplefull 则用于 Lucene。Valid values include simple|full, with simple as the default, and full for Lucene.

显示完整语法的示例Example showing full syntax

下面的示例使用 Lucene 查询语法在索引中查找文档,其在 queryType=full 参数中清晰易见。The following example finds documents in the index using the Lucene query syntax, evident in the queryType=full parameter. 此查询返回酒店,其中类别字段包含字词“budget”和所有包含短语“recently renovated”的可搜索字段。This query returns hotels where the category field contains the term "budget" and all searchable fields containing the phrase "recently renovated". 作为字词提升值 (3),包含短语“最近更新”的文档排名会更高。Documents containing the phrase "recently renovated" are ranked higher as a result of the term boost value (3).

searchMode=all 参数是在此示例中是相关的。The searchMode=all parameter is relevant in this example. 无论运算符何时出现在查询上,通常都应该设置 searchMode=all 以确保匹配所有条件。Whenever operators are on the query, you should generally set searchMode=all to ensure that all of the criteria is matched.

GET /indexes/hotels/docs?search=category:budget AND \"recently renovated\"^3&searchMode=all&api-version=2019-05-06&querytype=full

或者使用 POST:Alternatively, use POST:

POST /indexes/hotels/docs/search?api-version=2019-05-06
{
  "search": "category:budget AND \"recently renovated\"^3",
  "queryType": "full",
  "searchMode": "all"
}

有关其他示例,请参阅在 Azure 认知搜索中生成查询的 Lucene 查询语法示例For additional examples, see Lucene query syntax examples for building queries in Azure Cognitive Search. 有关指定查询参数的全部临时的详细信息,请参阅搜索(文档 Azure 认知搜索)REST APIFor details about specifying the full contingent of query parameters, see Search Documents (Azure Cognitive Search REST API).

备注

Azure 认知搜索还支持简单的查询语法,这是一种简单且可靠的查询语言,可用于简单的关键字搜索。Azure Cognitive Search also supports Simple Query Syntax, a simple and robust query language that can be used for straightforward keyword search.

语法基础Syntax fundamentals

下面的语法基础适用于所有使用 Lucene 语法的查询。The following syntax fundamentals apply to all queries that use the Lucene syntax.

上下文中的运算符评估Operator evaluation in context

位置决定符号解释为运算符或者解释为字符串中的另一个字符。Placement determines whether a symbol is interpreted as an operator or just another character in a string.

例如,Lucene 完整语法中,波浪线 (~) 用于模糊搜索和邻近搜索。For example, in Lucene full syntax, the tilde (~) is used for both fuzzy search and proximity search. 如果放在引用短语之后,则 ~ 调用邻近搜索。When placed after a quoted phrase, ~ invokes proximity search. 如果放在术语末尾,则 ~ 调用模糊搜索。When placed at the end of a term, ~ invokes fuzzy search.

该术语中,例如“business~analyst”,字符不评估为运算符。Within a term, such as "business~analyst", the character is not evaluated as an operator. 在此情况下,假设查询是术语或短语查询,则使用词法分析全文搜索会删除 ~ 并将术语“business~analyst”分为两部分:business 或 analyst。In this case, assuming the query is a term or phrase query, full text search with lexical analysis strips out the ~ and breaks the term "business~analyst" in two: business OR analyst.

上面的示例是波形符 (~),不过相同原则也适用于每个运算符。The example above is the tilde (~), but the same principle applies to every operator.

转义特殊字符Escaping special characters

特殊字符必须进行转义才能用作搜索文本的一部分。Special characters must be escaped to be used as part of the search text. 可以使用反斜杠 (\) 为其添加前缀来进行转义。You can escape them by prefixing them with backslash (\). 需要转义的特殊字符包括:Special characters that need to be escaped include the following:
+ - && || ! ( ) { } [ ] ^ " ~ * ? : \ /

例如,若要转义通配符,请使用 \*。For example, to escape a wildcard character, use \*.

对 URL 中的不安全及保留字符进行编码Encoding unsafe and reserved characters in URLs

请确保对 URL 中的所有不安全和保留字符进行编码。Please ensure all unsafe and reserved characters are encoded in a URL. 例如,“#”是不安全字符,因为它是 URL 中的片段/定位标识符。For example, '#' is an unsafe character because it is a fragement/anchor identifier in a URL. 如果用于 URL,则该字符必须编码为 %23The character must be encoded to %23 if used in a URL. "&" 和 "=" 是保留字符的示例,因为它们分隔参数并在 Azure 认知搜索中指定值。'&' and '=' are examples of reserved characters as they delimit parameters and specify values in Azure Cognitive Search. 有关更多详细信息,请参阅RFC1738:统一资源定位器(URL)Please see RFC1738: Uniform Resource Locators (URL) for more details.

不安全字符为 " ` < > # % { } | \ ^ ~ [ ]Unsafe characters are " ` < > # % { } | \ ^ ~ [ ]. 保留字符为 ; / ? : @ = + &Reserved characters are ; / ? : @ = + &.

优先运算符:分组和字段分组Precedence operators: grouping and field grouping

可以使用圆括号创建子查询,其包括附加说明语句中的运算符。You can use parentheses to create subqueries, including operators within the parenthetical statement. 例如,motel+(wifi||luxury) 将搜索包含“motel”术语以及“wifi”或“luxury”(或两者)的文档。For example, motel+(wifi||luxury) will search for documents containing the "motel" term and either "wifi" or "luxury" (or both).

字段分组与之类似,但将分组范围限定为单个字段。Field grouping is similar but scopes the grouping to a single field. 例如,hotelAmenities:(gym+(wifi||pool)) 在“hotelAmenities”字段中搜索“gym”和“wifi”,或者“gym”和“pool”。For example, hotelAmenities:(gym+(wifi||pool)) searches the field "hotelAmenities" for "gym" and "wifi", or "gym" and "pool".

SearchMode 参数注意事项SearchMode parameter considerations

searchModeAzure 认知搜索中的简单查询语法中所述的查询的影响同样适用于 Lucene 查询语法。The impact of searchMode on queries, as described in Simple query syntax in Azure Cognitive Search, applies equally to the Lucene query syntax. 也就是说,如果不清楚设置参数的方法的含义,那么 searchMode 与 NOT 运算符结合使用可能会导致查询结果异常。Namely, searchMode in conjunction with NOT operators can result in query outcomes that might seem unusual if you aren't clear on the implications of how you set the parameter. 如果保留默认值 searchMode=any,并使用 NOT 运算符,则该操作会作为 OR 操作进行计算,这样“New York”NOT“Seattle”会返回所有不是西雅图的城市。If you retain the default, searchMode=any, and use a NOT operator, the operation is computed as an OR action, such that "New York" NOT "Seattle" returns all cities that are not Seattle.

布尔运算符(AND、OR、NOT)Boolean operators (AND, OR, NOT)

始终全部以大写字母指定文本布尔运算符 (AND、OR、NOT)。Always specify text boolean operators (AND, OR, NOT) in all caps.

OR 运算符 OR||OR operator OR or ||

OR 运算符是一个竖条或管状字符。The OR operator is a vertical bar or pipe character. 例如:wifi || luxury 将搜索包含"wifi"或"luxury"(或两者)的文档。For example: wifi || luxury will search for documents containing either "wifi" or "luxury" or both. 由于 OR 是默认连接运算符,因此也可以省略,这样 wifi luxury 等同于 wifi || luxueryBecause OR is the default conjunction operator, you could also leave it out, such that wifi luxury is the equivalent of wifi || luxuery.

AND 运算符 AND&&+AND operator AND, && or +

AND 运算符为 & 号或加号。The AND operator is an ampersand or a plus sign. 例如:wifi && luxury 将搜索包含“wifi”和“luxury”的文档。For example: wifi && luxury will search for documents containing both "wifi" and "luxury". 加号字符 (+) 用于所需术语。The plus character (+) is used for required terms. 例如,+wifi +luxury 规定两个术语必须出现在单个文档的某个字段中。For example, +wifi +luxury stipulates that both terms must appear somewhere in the field of a single document.

NOT 运算符 NOT!-NOT operator NOT, ! or -

NOT 运算符为感叹号或减号。The NOT operator is an exclamation point or the minus sign. 例如:wifi !luxury 将搜索包含“wifi”和/或不包含“luxury”的文档。For example: wifi !luxury will search for documents that have the "wifi" term and/or do not have "luxury". searchMode 选项控制在没有 + 或 || 运算符的情况下,具有 NOT 运算符的术语与查询中的其他术语进行 ANDed 运算还是 ORed 运算。The searchMode option controls whether a term with the NOT operator is ANDed or ORed with the other terms in the query in the absence of a + or || operator. 请记住,searchMode 可设置为 any(默认)或 allRecall that searchMode can be set to either any(default) or all.

使用 searchMode=any 可以以包含更多结果的方式提高查询的查全率,且默认情况下将解释为“OR NOT”。Using searchMode=any increases the recall of queries by including more results, and by default - will be interpreted as "OR NOT". 例如,wifi -luxury 将匹配包含术语“wifi”或不包含术语“luxury”的文档。For example, wifi -luxury will match documents that either contain the term wifi or those that do not contain the term luxury.

使用 searchMode=all 可以以包含更少结果的方式提高查询的精确度,且默认情况下将解释为“AND NOT”。Using searchMode=all increases the precision of queries by including fewer results, and by default - will be interpreted as "AND NOT". 例如,wifi -luxury 将匹配包含术语 wifi 或不包含术语 luxury 的文档。For example, wifi -luxury will match documents that contain the term wifi and do not contain the term luxury. 这对于 - 运算符来说可能是更直观的行为。This is arguably a more intuitive behavior for the - operator. 因此,如果希望优化搜索的精确度(而非查全率),且用户在搜索中频繁使用 searchMode=all 运算符,那么应考虑选择 searchMode=any 而不是-Therefore, you should consider choosing searchMode=all over searchMode=any if you want to optimize searches for precision instead of recall and your users frequently use the - operator in searches.

查询大小限制Query size limitations

可以向 Azure 认知搜索发送的查询大小有限制。There is a limit to the size of queries that you can send to Azure Cognitive Search. 具体而言,最多可以有 1024 条子句(以 AND、OR 等分隔的表达式)。Specifically, you can have at most 1024 clauses (expressions separated by AND, OR, and so on). 此外,查询中任何单个术语的大小限制为大约 32 KB。There is also a limit of approximately 32 KB on the size of any individual term in a query. 如果应用程序以编程方式生成搜索查询,则建议将其设计为不会生成无限大小的查询。If your application generates search queries programmatically, we recommend designing it in such a way that it does not generate queries of unbounded size.

对通配符和正则表达式查询评分Scoring wildcard and regex queries

Azure 认知搜索使用基于频率的评分(TF-IDF)进行文本查询。Azure Cognitive Search uses frequency-based scoring (TF-IDF) for text queries. 但是,对于术语范围可能很广的通配符和正则表达式查询,则忽略频率因子,以防止排名偏向于比较少见的术语匹配。However, for wildcard and regex queries where scope of terms can potentially be broad, the frequency factor is ignored to prevent the ranking from biasing towards matches from rarer terms. 通配符和正则表达式搜索对所有匹配项和正则表达式搜索进行相同处理。All matches are treated equally for wildcard and regex searches.

字段化搜索Fielded search

可以使用 fieldName:searchExpression 语法定义字段化搜索操作,其中的搜索表达式可以是单个词,也可以是一个短语,或者是括号中的更复杂的表达式,可以选择使用布尔运算符。You can define a fielded search operation with the fieldName:searchExpression syntax, where the search expression can be a single word or a phrase, or a more complex expression in parentheses, optionally with Boolean operators. 一些示例包括以下内容:Some examples include the following:

  • 流派:爵士乐无历史记录genre:jazz NOT history

  • 艺术家:(“Miles Davis”、“John Coltrane”)artists:("Miles Davis" "John Coltrane")

如果想要两个字符串评估为单个实体,请务必将多个字符串放置在引号内,正如这个在 artists 字段中搜索两个不同艺术家的情况一样。Be sure to put multiple strings within quotation marks if you want both strings to be evaluated as a single entity, in this case searching for two distinct artists in the artists field.

fieldName:searchExpression 中指定的字段必须是 searchable 字段。The field specified in fieldName:searchExpression must be a searchable field. 有关如何在字段定义中使用索引属性的详细信息,请参阅创建索引See Create Index for details on how index attributes are used in field definitions.

备注

使用字段化搜索表达式时,不需使用 searchFields 参数,因为每个字段化搜索表达式都有一个显式指定的字段名称。When using fielded search expressions, you do not need to use the searchFields parameter because each fielded search expression has a field name explicitly specified. 但是,如果需要运行查询,则仍可使用 searchFields 参数,其中的某些部分局限于特定字段,其余部分可以应用到多个字段。However, you can still use the searchFields parameter if you want to run a query where some parts are scoped to a specific field, and the rest could apply to several fields. 例如,查询 search=genre:jazz NOT history&searchFields=description 只将 jazz 匹配到 genre 字段,而它则会将 NOT historydescription 字段匹配。For example, the query search=genre:jazz NOT history&searchFields=description would match jazz only to the genre field, while it would match NOT history with the description field. fieldName:searchExpression 中提供的字段名称始终优先于 searchFields 参数,这就是在此示例中我们不需在 genre 参数中包括 searchFields 的原因。The field name provided in fieldName:searchExpression always takes precedence over the searchFields parameter, which is why in this example, we do not need to include genre in the searchFields parameter.

模糊搜索Fuzzy search

模糊搜索在构造相似的术语中查找匹配项。A fuzzy search finds matches in terms that have a similar construction. 对于 Lucene 文档,模糊搜索基于 Damerau-Levenshtein 距离Per Lucene documentation, fuzzy searches are based on Damerau-Levenshtein Distance. 模糊搜索可以将满足距离条件的项扩展到最多 50 个字词。Fuzzy searches can expand a term up to the maximum of 50 terms that meet the distance criteria.

若要进行模糊搜索,请在单个词末尾使用“~”波形符,另附带指定编辑距离的可选参数(0 到 2 [默认] 之间的值)。To do a fuzzy search, use the tilde "~" symbol at the end of a single word with an optional parameter, a number between 0 and 2 (default), that specifies the edit distance. 例如“blue~”或“blue~1”会返回“blue”、“blues”和“glue”。For example, "blue~" or "blue~1" would return "blue", "blues", and "glue".

模糊搜索只能应用于术语,不能应用于短语,但是你可以在包含多个部分的名称或短语中将波形符单独追加到每个术语。Fuzzy search can only be applied to terms, not phrases, but you can append the tilde to each term individually in a multi-part name or phrase. 例如,“Unviersty~ of~ "Wshington~”会与“University of Washington”匹配。For example, "Unviersty~ of~ "Wshington~" would match on "University of Washington".

邻近搜索Proximity search

邻近搜索用于搜索文档中彼此邻近的术语。Proximity searches are used to find terms that are near each other in a document. 在短语末尾插入波形符“~”,后跟创建邻近边界的词数。Insert a tilde "~" symbol at the end of a phrase followed by the number of words that create the proximity boundary. 例如 "hotel airport"~5 将查找文档中彼此相距 5 个字以内的术语“酒店”和“机场”。For example, "hotel airport"~5 will find the terms "hotel" and "airport" within 5 words of each other in a document.

术语提升Term boosting

术语提升是指相对于不包含术语的文档,提高包含提升术语的文档排名。Term boosting refers to ranking a document higher if it contains the boosted term, relative to documents that do not contain the term. 这不同于计分配置文件,因为计分配置文件提升某些字段,而非特定术语。This differs from scoring profiles in that scoring profiles boost certain fields, rather than specific terms.

以下示例有助于解释这些差异。The following example helps illustrate the differences. 假设某个字段中存在提升匹配度的计分概要文件,例如 musicstoreindex 示例中的“流派”Suppose that there's a scoring profile that boosts matches in a certain field, say genre in the musicstoreindex example. 术语提升可用于进一步提升高于其他术语的某些搜索词。Term boosting could be used to further boost certain search terms higher than others. 例如 rock^2 electronic 将提升“流派”字段(高于搜索中其他搜索字段)中包含搜索词的文档。For example, rock^2 electronic will boost documents that contain the search terms in the genre field higher than other searchable fields in the index. 另外,由于术语提升值 (2),包含搜索词“rock”的文档的排名要比包含搜索词“electronic”的要高。Further, documents that contain the search term rock will be ranked higher than the other search term electronic as a result of the term boost value (2).

若要提升术语,请使用插入符号“^”,并且所搜索术语末尾还要附加提升系数(数字)。To boost a term use the caret, "^", symbol with a boost factor (a number) at the end of the term you are searching. 还可以提升短语。You can also boost phrases. 提升系数越高,术语相对于其他搜索词的相关性也越大。The higher the boost factor, the more relevant the term will be relative to other search terms. 默认情况下,提升系数是 1。By default, the boost factor is 1. 虽然提升系数必须是正数,但可以小于 1(例如 0.20)。Although the boost factor must be positive, it can be less than 1 (for example, 0.20).

正则表达式搜索Regular expression search

正则表达式搜索基于正斜杠“/”之间的内容查找匹配项,如在 RegExp 类中所记录的那样。A regular expression search finds a match based on the contents between forward slashes "/", as documented in the RegExp class.

例如,若要查找包含“汽车旅馆”或“酒店”的文档,请指定 /[mh]otel/For example, to find documents containing "motel" or "hotel", specify /[mh]otel/. 正则表达式搜索与单个词匹配。Regular expression searches are matched against single words.

通配符搜索Wildcard search

可将通常可识别的语法用于多个 (*) 或单个 (?) 字符通配符搜索。You can use generally recognized syntax for multiple (*) or single (?) character wildcard searches. 请注意,Lucene 查询分析器支持将这些符号与单个术语一起使用,但不能与短语一起使用。Note the Lucene query parser supports the use of these symbols with a single term, and not a phrase.

例如,若要查找前缀为“note”的词(如“notebook”或“notepad”)的文档,请指定“note*”。For example, to find documents containing the words with the prefix "note", such as "notebook" or "notepad", specify "note*".

备注

不得将 * 或 ?You cannot use a * or ? 符号用作搜索的第一个字符。symbol as the first character of a search.
不对通配符搜索查询执行文本分析。No text analysis is performed on wildcard search queries. 查询时,通配符查询术语与搜索索引中所分析的字词进行比较并展开。At query time, wildcard query terms are compared against analyzed terms in the search index and expanded.

另请参阅See also