您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

Azure 认知搜索中的简单查询语法Simple query syntax in Azure Cognitive Search

Azure 认知搜索实现了两种基于 Lucene 的查询语言:简单查询分析器lucene 查询分析器Azure Cognitive Search implements two Lucene-based query languages: Simple Query Parser and the Lucene Query Parser. 在 Azure 认知搜索中,简单查询语法排除模糊/slop 选项。In Azure Cognitive Search, the simple query syntax excludes the fuzzy/slop options.

备注

Azure 认知搜索提供了一种替代的Lucene 查询语法用于更复杂的查询。Azure Cognitive Search provides an alternative Lucene Query Syntax for more complex queries. 若要详细了解查询分析体系结构和每种语法的优点,请参阅Azure 认知搜索中全文搜索的工作原理To learn more about query parsing architecture and benefits of each syntax, see How full text search works in Azure Cognitive Search.

如何调用简单分析How to invoke simple parsing

简单语法为默认语法。Simple syntax is the default. 仅当将语法从“完整”重置为“简单”时才需要调用。Invocation is only necessary if you are resetting the syntax from full to simple. 若要显式设置语法,请使用 queryType 搜索参数。To explicitly set the syntax, use the queryType search parameter. 有效值包括 simple|full,其中默认值为 simplefull 则用于 Lucene。Valid values include simple|full, with simple as the default, and full for Lucene.

查询行为异常Query behavior anomalies

包含一个或多个词条的任何文本都被视为查询执行的有效起点。Any text with one or more terms is considered a valid starting point for query execution. Azure 认知搜索将匹配包含任何或所有字词的文档,包括在分析文本期间找到的任何变体。Azure Cognitive Search will match documents containing any or all of the terms, including any variations found during analysis of the text.

正如这种声音,Azure 认知搜索中存在查询执行的一个方面,这可能会产生意外的结果,增加而不是减少搜索结果,因为在输入字符串中添加了更多的术语和运算符。As straightforward as this sounds, there is one aspect of query execution in Azure Cognitive Search that might produce unexpected results, increasing rather than decreasing search results as more terms and operators are added to the input string. 这种扩展是否会实际发生取决于是否包含 NOT 运算符,以及组合使用的 searchMode 参数设置,该参数设置确定如何根据 AND 或 OR 行为解释 NOT。Whether this expansion actually occurs depends on the inclusion of a NOT operator, combined with a searchMode parameter setting that determines how NOT is interpreted in terms of AND or OR behaviors. 在默认值为 searchMode=Any 并使用 NOT 运算符的情况下,该运算会作为 OR 操作进行计算,以便 "New York" NOT Seattle 返回非 Seattle 的所有城市。Given the default, searchMode=Any, and a NOT operator, the operation is computed as an OR action, such that "New York" NOT Seattle returns all cities that are not Seattle.

通常情况下,更有可能在搜索内容的应用程序的用户交互模式中看到这些行为,其中用户更有可能在查询中包含运算符,而不是具有更多内置导航结构的电子商务网站。Typically, you're more likely to see these behaviors in user interaction patterns for applications that search over content, where users are more likely to include an operator in a query, as opposed to e-commerce sites that have more built-in navigation structures. 有关详细信息,请参阅 NOT 运算符For more information, see NOT operator.

布尔运算符(AND、OR、NOT)Boolean operators (AND, OR, NOT)

可以在查询字符串中嵌入运算符,以生成一组丰富的条件,用于发现匹配的文档。You can embed operators in a query string to build a rich set of criteria against which matching documents are found.

AND 运算符 +AND operator +

AND 运算符是一个加号。The AND operator is a plus sign. 例如,wifi+luxury 将搜索包含 wifiluxury 的文档。For example, wifi+luxury will search for documents containing both wifi and luxury.

OR 运算符 |OR operator |

OR 运算符是一个竖条或管状字符。The OR operator is a vertical bar or pipe character. 例如,wifi | luxury 将搜索包含 wifiluxury 或两者的文档。For example, wifi | luxury will search for documents containing either wifi or luxury or both.

NOT 运算符 -NOT operator -

NOT 运算符是一个减号。The NOT operator is a minus sign. 例如,wifi –luxury 将搜索包含 wifi 词条和/或不包含 luxury(和/或由 searchMode 控制)的文档。For example, wifi –luxury will search for documents that have the wifi term and/or do not have luxury (and/or is controlled by searchMode).

备注

searchMode 选项控制在没有 +| 运算符的情况下,带有 NOT 运算符的词条是与查询中的其他词条进行 AND 运算还是 OR 运算。The searchMode option controls whether a term with the NOT operator is ANDed or ORed with the other terms in the query in the absence of a + or | operator. 请记住,searchMode 可设置为 any(默认)或 allRecall that searchMode can be set to either any (default) or all. 如果使用 any,可以以包含更多结果的方式提高查询的查全率,且默认情况下将 - 解释为“OR NOT”。If you use any, it will increase the recall of queries by including more results, and by default - will be interpreted as "OR NOT". 例如,wifi -luxury 将匹配包含 wifi 词条或不包含 luxury 词条的文档。For example, wifi -luxury will match documents that either contain the term wifi or those that do not contain the term luxury. 如果使用 all,可以以包含更少结果的方式提高查询的精确度,且默认情况下将 - 解释为“AND NOT”。If you use all, it will increase the precision of queries by including fewer results, and by default - will be interpreted as "AND NOT". 例如,wifi -luxury 将匹配包含 wifi 词条且不包含“luxury”词条的文档。For example, wifi -luxury will match documents that contain the term wifi and do not contain the term "luxury". 这对于 - 运算符来说可能是更直观的行为。This is arguably a more intuitive behavior for the - operator. 因此,如果想要优化搜索精确度(而非查全率),用户在搜索中频繁使用 - 运算符,则应考虑使用 searchMode=all 而不是 searchMode=anyTherefore, you should consider using searchMode=all instead of searchMode=any if You want to optimize searches for precision instead of recall, and Your users frequently use the - operator in searches.

后缀运算符Suffix operator

后缀运算符为星号 *The suffix operator is an asterisk *. 例如,lux* 将搜索包含以 lux 开头的词条的文档(忽略大小写)。For example, lux* will search for documents that have a term that starts with lux, ignoring case.

短语搜索运算符Phrase search operator

短语运算符将短语括在引号 " "中。The phrase operator encloses a phrase in quotation marks " ". 例如,Roach Motel(没有引号)会以任何顺序在任何位置搜索包含 Roach 和/或 Motel 的文档,而 "Roach Motel"(带引号)只会匹配包含整个短语并按该顺序排列的文档(文本分析仍然适用)。For example, while Roach Motel (without quotes) would search for documents containing Roach and/or Motel anywhere in any order, "Roach Motel" (with quotes) will only match documents that contain that whole phrase together and in that order (text analysis still applies).

优先级运算符Precedence operator

优先级运算符将字符串用括号括起来 ( )The precedence operator encloses the string in parentheses ( ). 例如,motel+(wifi | luxury) 将搜索包含 motel 项的文档,并且 wifiluxury (或两者)。For example, motel+(wifi | luxury) will search for documents containing the motel term and either wifi or luxury (or both).

转义搜索运算符Escaping search operators

为了将上述符号实际用于搜索文本,应将反斜杠用作其前缀对它们进行转义。In order to use the above symbols as actual part of the search text, they should be escaped by prefixing them with a backslash. 例如,luxury\+hotel 将生成 luxury+hotel 词条。For example, luxury\+hotel will result in the term luxury+hotel. 为了让更典型的情况变得简单,此规则有两个不需要进行转义的例外:In order to make things simple for the more typical cases, there are two exceptions to this rule where escaping is not needed:

  • 仅当 NOT 运算符 - 是空格之后的第一个字符时才需要对其进行转义,如果它位于词条中间则不需要对其进行转义。The NOT operator - only needs to be escaped if it's the first character after whitespace, not if it's in the middle of a term. 例如,wi-fi 是单一词条;而 GUID(例如 3352CDD0-EF30-4A2E-A512-3B30AF40F3FD)被视为单一标记。For example, wi-fi is a single term; whereas GUIDs (such as 3352CDD0-EF30-4A2E-A512-3B30AF40F3FD) are treated as a single token.
  • 仅当后缀运算符 * 是空格之前的最后一个字符时才需要对其进行转义,如果它位于词条中间则不需要对其进行转义。The suffix operator * needs to be escaped only if it's the last character before whitespace, not if it's in the middle of a term. 例如,wi*fi 被视为单一标记。For example, wi*fi is treated as a single token.

备注

虽然转义使标记保持在一起,但文本分析可能会将它们拆分,具体取决于分析模式。Although escaping keeps tokens together, text analysis may split them up, depending on the analysis mode. 有关详细信息,请参阅语言支持(Azure 认知搜索 REST API) See Language support (Azure Cognitive Search REST API) for details.

另请参阅See also