您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

使用 "完整" Lucene 搜索语法(Azure 中的高级查询认知搜索)Use the "full" Lucene search syntax (advanced queries in Azure Cognitive Search)

构造 Azure 认知搜索的查询时,可以将默认的简单查询分析器替换为azure 认知搜索中更广泛的 Lucene 查询分析器,以表述专用的和高级的查询定义。When constructing queries for Azure Cognitive Search, you can replace the default simple query parser with the more expansive Lucene Query Parser in Azure Cognitive Search to formulate specialized and advanced query definitions.

Lucene 分析器支持复杂的查询构造,如字段范围查询、模糊和前缀通配符搜索、邻近搜索、术语提升和正则表达式搜索。The Lucene parser supports complex query constructs, such as field-scoped queries, fuzzy and prefix wildcard search, proximity search, term boosting, and regular expression search. 额外的功能需遵守额外的处理要求,因此执行时间应该会更长一些。The additional power comes with additional processing requirements so you should expect a slightly longer execution time. 本文展示了使用完整语法时的查询操作示例,可以按照这些示例逐步操作。In this article, you can step through examples demonstrating query operations available when using the full syntax.

备注

通过完整的 Lucene 查询语法实现的专用查询构造很多都不是按文本分析的,所以并不涉及词干分解和词形还原,这一点有些出人意料。Many of the specialized query constructions enabled through the full Lucene query syntax are not text-analyzed, which can be surprising if you expect stemming or lemmatization. 只会对完整字词(字词查询或短语查询)进行词法分析。Lexical analysis is only performed on complete terms (a term query or phrase query). 字词不完整的查询类型(前缀查询、通配符查询、正则表达式查询、模糊查询)会被直接添加到查询树中,绕过分析阶段。Query types with incomplete terms (prefix query, wildcard query, regex query, fuzzy query) are added directly to the query tree, bypassing the analysis stage. 对不完整查询字词执行的唯一转换操作是转换为小写。The only transformation performed on incomplete query terms is lowercasing.

在 Postman 中创建请求Formulate requests in Postman

下面的示例使用“纽约工作岗位”搜索索引,它包含基于纽约市开放数据计划提供的数据集得出的岗位。The following examples leverage a NYC Jobs search index consisting of jobs available based on a dataset provided by the City of New York OpenData initiative. 此数据不应认为是最新或完整数据。This data should not be considered current or complete. 此索引位于 Microsoft 提供的沙盒服务中,这意味着你不需要 Azure 订阅或 Azure 认知搜索来尝试这些查询。The index is on a sandbox service provided by Microsoft, which means you do not need an Azure subscription or Azure Cognitive Search to try these queries.

要在 GET 上发出 HTTP 请求,需具备 Postman 或其等效工具。What you do need is Postman or an equivalent tool for issuing HTTP request on GET. 有关详细信息,请参阅使用 REST 客户端进行浏览For more information, see Explore with REST clients.

设置请求标头Set the request header

  1. 在请求标头中,将“Content-Type”设为 application/jsonIn the request header, set Content-Type to application/json.

  2. 添加 api-key,并将其设为此字符串:252044BE3886FE4A8E3BAA4F595114BBAdd an api-key, and set it to this string: 252044BE3886FE4A8E3BAA4F595114BB. 它是托管“纽约工作岗位”索引的沙盒搜索服务的查询密钥。This is a query key for the sandbox search service hosting the NYC Jobs index.

指定请求标头后,只需更改“search=”字符串即可在本文中的各项查询中重复使用。After you specify the request header, you can reuse it for all of the queries in this article, swapping out only the search= string.

Postman 请求标头

设置请求 URLSet the request URL

请求是与包含 Azure 认知搜索终结点和搜索字符串的 URL 配对的 GET 命令。Request is a GET command paired with a URL containing the Azure Cognitive Search endpoint and search string.

Postman 请求标头

URL 组合具备以下元素:URL composition has the following elements:

  • https://azs-playground.search.windows.net/ 是由 Azure 认知搜索开发团队维护的沙箱搜索服务。https://azs-playground.search.windows.net/ is a sandbox search service maintained by the Azure Cognitive Search development team.
  • indexes/nycjobs/ 是该服务的索引集合中的“纽约工作岗位”索引。indexes/nycjobs/ is the NYC Jobs index in the indexes collection of that service. 请求中需同时具备服务名称和索引。Both the service name and index are required on the request.
  • docs 是包含所有可搜索内容的文档集合。docs is the documents collection containing all searchable content. 请求标头中提供的查询 api-key 仅适用于针对文档集合的读取操作。The query api-key provided in the request header only works on read operations targeting the documents collection.
  • api-version=2019-05-06 设置了 api-version(每个请求都需具备此参数)。api-version=2019-05-06 sets the api-version, which is a required parameter on every request.
  • search=* 是查询字符串,此元素在初始查询中为 NULL,返回前 50 个结果(此为默认情况)。search=* is the query string, which in the initial query is null, returning the first 50 results (by default).

发送自己的第一个查询Send your first query

进行验证,将以下请求粘贴至 GET 并单击“发送”。As a verification step, paste the following request into GET and click Send. 结果以详细的 JSON 文档形式返回。Results are returned as verbose JSON documents. 返回整个文档,这允许您查看所有字段和所有值。Entire documents are returned, which allows you to see all fields and all values.

将此 URL 作为验证步骤粘贴到 REST 客户端,并查看文档结构。Paste this URL into a REST client as a validation step and to view document structure.

https://azs-playground.search.windows.net/indexes/nycjobs/docs?api-version=2019-05-06&$count=true&search=*

查询字符串 search=* 是一个未指定的搜索,它与 NULL 或空搜索等效。The query string, search=*, is an unspecified search equivalent to null or empty search. 最简单的搜索就是您可以执行的操作。It's the simplest search you can do.

可选择将 $count=true 添加到 URL,以便返回一个符合搜索条件的文档的计数。Optionally, you can add $count=true to the URL to return a count of the documents matching the search criteria. 在空搜索字符串上,这是索引(在 NYC 作业的情况下约为 2800)中的所有文档。On an empty search string, this is all the documents in the index (about 2800 in the case of NYC Jobs).

如何调用完整 Lucene 分析How to invoke full Lucene parsing

添加 queryType=full 可调用完整查询语法,替代默认的简单查询语法。Add queryType=full to invoke the full query syntax, overriding the default simple query syntax.

https://azs-playground.search.windows.net/indexes/nycjobs/docs?api-version=2019-05-06&queryType=full&search=*

本文中的所有示例都指定了 queryType=full 搜索参数,指明由 Lucene 查询分析程序处理完整语法。All of the examples in this article specify the queryType=full search parameter, indicating that the full syntax is handled by the Lucene Query Parser.

示例1:查询范围为字段列表Example 1: Query scoped to a list of fields

第一个示例并不特定于 Lucene,但我们首先介绍第一个基本的查询概念:字段范围。This first example is not Lucene-specific, but we lead with it to introduce the first fundamental query concept: field scope. 此示例将整个查询和响应的范围限定为特定的几个字段。This example scopes the entire query and the response to just a few specific fields. 当你的工具是 Postman 或搜索资源管理器时,了解如何构建可读的 JSON 响应非常重要。Knowing how to structure a readable JSON response is important when your tool is Postman or Search explorer.

出于简洁目的,该查询仅针对 business_title 字段并指定仅返回职位。For brevity, the query targets only the business_title field and specifies only business titles are returned. SearchFields参数将查询执行限制为仅 business_title 字段,并选择指定响应中包含的字段。The searchFields parameter restricts query execution to just the business_title field, and select specifies which fields are included in the response.

部分查询字符串Partial query string

&search=*&searchFields=business_title&$select=business_title

下面是以逗号分隔的列表中具有多个字段的相同查询。Here is the same query with multiple fields in a comma-delimited list.

search=*&searchFields=business_title, posting_type&$select=business_title, posting_type

逗号后的空格是可选的。The spaces after the commas are optional.

提示

使用应用程序代码中的 REST API 时,请不要忘记 URL 编码参数,如 $selectsearchFieldsWhen using the REST API from your application code, don't forget to URL-encode parameters like $select and searchFields.

完整 URLFull URL

https://azs-playground.search.windows.net/indexes/nycjobs/docs?api-version=2019-05-06&queryType=full&$count=true&search=*&searchFields=business_title&$select=business_title

此查询的响应应与以下屏幕截图类似。Response for this query should look similar to the following screenshot.

Postman 示例响应

你可能已经注意到响应中的搜索分数。You might have noticed the search score in the response. 由于搜索不是全文搜索或者没有应用条件,因此不存在排名时评分统统为 1。Uniform scores of 1 occur when there is no rank, either because the search was not full text search, or because no criteria was applied. 对于不带条件的空搜索,按任意顺序返回行。For null search with no criteria, rows come back in arbitrary order. 当你包括实际搜索条件时,你将看到搜索评分演化为有意义的值。When you include actual search criteria, you will see search scores evolve into meaningful values.

Full Lucene 语法支持将各个搜索表达式的范围限定为特定字段。Full Lucene syntax supports scoping individual search expressions to a specific field. 此示例在业务标题中搜索高级版,但不搜索初级企业。This example searches for business titles with the term senior in them, but not junior.

部分查询字符串Partial query string

$select=business_title&search=business_title:(senior NOT junior)

下面是包含多个字段的相同查询。Here is the same query with multiple fields.

$select=business_title, posting_type&search=business_title:(senior NOT junior) AND posting_type:external

完整 URLFull URL

https://azs-playground.search.windows.net/indexes/nycjobs/docs?api-version=2019-05-06&queryType=full&$count=true&$select=business_title&search=business_title:(senior NOT junior)

Postman 示例响应

您可以使用fieldName: searchExpression语法定义现场搜索操作,其中,搜索表达式可以是单个词或短语,也可以是用括号括起来的更复杂的表达式,还可以选择使用布尔运算符。You can define a fielded search operation with the fieldName:searchExpression syntax, where the search expression can be a single word or a phrase, or a more complex expression in parentheses, optionally with Boolean operators. 一些示例包括以下内容:Some examples include the following:

  • business_title:(senior NOT junior)
  • state:("New York" OR "New Jersey")
  • business_title:(senior NOT junior) AND posting_type:external

如果希望将这两个字符串作为单个实体进行计算,请确保将多个字符串放在引号内,如在 "state" 字段中搜索两个不同的位置。Be sure to put multiple strings within quotation marks if you want both strings to be evaluated as a single entity, as in this case searching for two distinct locations in the state field. 此外,请确保运算符大写,就像你看到的 NOT 和 AND 一样。Also, ensure the operator is capitalized as you see with NOT and AND.

FieldName: searchExpression中指定的字段必须是可搜索字段。The field specified in fieldName:searchExpression must be a searchable field. 有关如何在字段定义中使用索引属性的详细信息,请参阅创建索引(Azure 认知搜索 REST API)See Create Index (Azure Cognitive Search REST API) for details on how index attributes are used in field definitions.

备注

在上面的示例中,我们不需要使用 searchFields 参数,因为查询的每个部分都显式指定了字段名称。In the example above, we did not need to use the searchFields parameter because each part of the query has a field name explicitly specified. 但是,如果要运行的查询中某些部分的作用域限定为特定字段,则仍可使用 searchFields 参数,其余部分则可应用于多个字段。However, you can still use the searchFields parameter if you want to run a query where some parts are scoped to a specific field, and the rest could apply to several fields. 例如,查询 search=business_title:(senior NOT junior) AND external&searchFields=posting_type 仅将 senior NOT juniorbusiness_title 字段匹配,而它会将 "external" 与 posting_type 字段匹配。For example, the query search=business_title:(senior NOT junior) AND external&searchFields=posting_type would match senior NOT junior only to the business_title field, while it would match "external" with the posting_type field. FieldName: searchExpression中提供的字段名称的优先级始终高于 searchFields 参数,在此示例中,我们不需要在 searchFields 参数中包含 business_titleThe field name provided in fieldName:searchExpression always takes precedence over the searchFields parameter, which is why in this example, we do not need to include business_title in the searchFields parameter.

完整 Lucene 语法还支持模糊搜索,能对构造相似的术语进行匹配。Full Lucene syntax also supports fuzzy search, matching on terms that have a similar construction. 若要执行模糊搜索,请在单个字词的末尾追加“~”波形符,后跟指定编辑距离的可选参数(介于 0 到 2 之间的值)。To do a fuzzy search, append the tilde ~ symbol at the end of a single word with an optional parameter, a value between 0 and 2, that specifies the edit distance. 例如,blue~blue~1 会返回 blue、blues 和 glue。For example, blue~ or blue~1 would return blue, blues, and glue.

部分查询字符串Partial query string

searchFields=business_title&$select=business_title&search=business_title:asosiate~

不直接支持短语,但你可以在短语的组件部分指定模糊匹配。Phrases aren't supported directly but you can specify a fuzzy match on component parts of a phrase.

searchFields=business_title&$select=business_title&search=business_title:asosiate~ AND comm~ 

完整 URLFull URL

此查询搜索带有术语“associate”(故意拼错)的作业:This query searches for jobs with the term "associate" (deliberately misspelled):

https://azs-playground.search.windows.net/indexes/nycjobs/docs?api-version=2019-05-06&queryType=full&$count=true&searchFields=business_title&$select=business_title&search=business_title:asosiate~

模糊搜索响应

备注

不会对模糊查询进行分析Fuzzy queries are not analyzed. 字词不完整的查询类型(前缀查询、通配符查询、正则表达式查询、模糊查询)会被直接添加到查询树中,绕过分析阶段。Query types with incomplete terms (prefix query, wildcard query, regex query, fuzzy query) are added directly to the query tree, bypassing the analysis stage. 对不完整查询字词执行的唯一转换操作是转换为小写。The only transformation performed on incomplete query terms is lowercasing.

邻近搜索用于搜索文档中彼此邻近的术语。Proximity searches are used to find terms that are near each other in a document. 在短语末尾插入波形符“~”,后跟创建邻近边界的词数。Insert a tilde "~" symbol at the end of a phrase followed by the number of words that create the proximity boundary. 例如“酒店机场”~5 将查找文档中彼此之间 5 个字以内的术语“酒店”和“机场”。For example, "hotel airport"~5 will find the terms hotel and airport within 5 words of each other in a document.

部分查询字符串Partial query string

searchFields=business_title&$select=business_title&search=business_title:%22senior%20analyst%22~1

完整 URLFull URL

在此查询中,对于包含术语“senior analyst”的作业(其中分隔字数不超过一个字):In this query, for jobs with the term "senior analyst" where it is separated by no more than one word:

https://azs-playground.search.windows.net/indexes/nycjobs/docs?api-version=2019-05-06&queryType=full&$count=true&searchFields=business_title&$select=business_title&search=business_title:%22senior%20analyst%22~1

邻近查询

再次尝试删除术语“高级分析师”之间的词。Try it again removing the words between the term "senior analyst". 请注意,此查询返回了 8 个文档,而前面的查询中返回了 10 个文档。Notice that 8 documents are returned for this query as opposed to 10 for the previous query.

https://azs-playground.search.windows.net/indexes/nycjobs/docs?api-version=2019-05-06&queryType=full&$count=true&searchFields=business_title&$select=business_title&search=business_title:%22senior%20analyst%22~0

示例 5:术语提升Example 5: Term boosting

术语提升是指相对于不包含术语的文档,提高包含提升术语的文档排名。Term boosting refers to ranking a document higher if it contains the boosted term, relative to documents that do not contain the term. 若要提升术语,请使用插入符号“^”,并且所搜索术语末尾还要附加提升系数(数字)。To boost a term, use the caret, "^", symbol with a boost factor (a number) at the end of the term you are searching.

完整 UrlFull URLs

在“before”查询中,搜索包含术语“computer analyst”的作业时,你会发现没有同时包含“computer”和“analyst”的结果,但“computer”作业排在结果顶部。In this "before" query, search for jobs with the term computer analyst and notice there are no results with both words computer and analyst, yet computer jobs are at the top of the results.

https://azs-playground.search.windows.net/indexes/nycjobs/docs?api-version=2019-05-06&queryType=full&$count=true&searchFields=business_title&$select=business_title&search=business_title:computer%20analyst

...前提升术语

在“after”查询中,请重试该搜索,如果两个词都不存在,此时会提升包含术语“analyst”而非“computer”的结果。In the "after" query, repeat the search, this time boosting results with the term analyst over the term computer if both words do not exist.

https://azs-playground.search.windows.net/indexes/nycjobs/docs?api-version=2019-05-06&queryType=full&$count=true&searchFields=business_title&$select=business_title&search=business_title:computer%20analyst%5e2

上述查询有一个更能让人理解的版本:search=business_title:computer analyst^2A more human readable version of the above query is search=business_title:computer analyst^2. 对于可操作的查询,^2 被编码为 %5E2,这比较不容易理解。For a workable query, ^2 is encoded as %5E2, which is harder to see.

...后提升术语

术语提升不同于计分配置文件,因为计分配置文件提升某些字段,而非特定术语。Term boosting differs from scoring profiles in that scoring profiles boost certain fields, rather than specific terms. 以下示例有助于解释这些差异。The following example helps illustrate the differences.

请考虑在某个字段中提升匹配项的计分配置文件,例如 musicstoreindex 示例中的“流派”。Consider a scoring profile that boosts matches in a certain field, such as genre in the musicstoreindex example. 术语提升可用于进一步提升高于其他术语的某些搜索词。Term boosting could be used to further boost certain search terms higher than others. 例如“rock^2 electronic”将提升在“流派”字段(高于搜索中的其他搜索字段)中包含搜索词的文档。For example, "rock^2 electronic" will boost documents that contain the search terms in the genre field higher than other searchable fields in the index. 另外,由于术语提升值 (2) 的原因,包含搜索词“rock”的文档的排名要比包含搜索词“electronic”的要高。Furthermore, documents that contain the search term "rock" will be ranked higher than the other search term "electronic" as a result of the term boost value (2).

在设置因素级别时,提升系数越高,术语相对于其他搜索词的相关性也越大。When setting the factor level, the higher the boost factor, the more relevant the term will be relative to other search terms. 默认情况下,提升系数是 1。By default, the boost factor is 1. 虽然提升系数必须是整数,但可以小于 1(例如 0.2)。Although the boost factor must be positive, it can be less than 1 (for example, 0.2).

示例 6:正则表达式Example 6: Regex

正则表达式搜索基于正斜杠“/”之间的内容查找匹配项,如在 RegExp 类中所记录的那样。A regular expression search finds a match based on the contents between forward slashes "/", as documented in the RegExp class.

部分查询字符串Partial query string

searchFields=business_title&$select=business_title&search=business_title:/(Sen|Jun)ior/

完整 URLFull URL

在此查询中,搜索术语 "高级" 或 "初级: search=business_title:/(Sen|Jun)ior/"。In this query, search for jobs with either the term Senior or Junior: search=business_title:/(Sen|Jun)ior/.

https://azs-playground.search.windows.net/indexes/nycjobs/docs?api-version=2019-05-06&queryType=full&$count=true&searchFields=business_title&$select=business_title&search=business_title:/(Sen|Jun)ior/

正则表达式查询

备注

不会对正则表达式查询进行分析Regex queries are not analyzed. 对不完整查询字词执行的唯一转换操作是转换为小写。The only transformation performed on incomplete query terms is lowercasing.

可将通常可识别的语法用于多个 (*) 或单个 (?) 字符通配符搜索。You can use generally recognized syntax for multiple (*) or single (?) character wildcard searches. 请注意,Lucene 查询分析器支持将这些符号与单个术语一起使用,但不能与短语一起使用。Note the Lucene query parser supports the use of these symbols with a single term, and not a phrase.

部分查询字符串Partial query string

searchFields=business_title&$select=business_title&search=business_title:prog*

完整 URLFull URL

在此查询中,搜索包含前缀“prog”的作业,这会包含带有术语“编程”和“程序员”的职位。In this query, search for jobs that contain the prefix 'prog' which would include business titles with the terms programming and programmer in it. 不得将 * 或 ?You cannot use a * or ? 符号用作搜索的第一个字符。symbol as the first character of a search.

https://azs-playground.search.windows.net/indexes/nycjobs/docs?api-version=2019-05-06&queryType=full&$count=true&searchFields=business_title&$select=business_title&search=business_title:prog*

通配符查询

备注

不会对通配符查询进行分析Wildcard queries are not analyzed. 对不完整查询字词执行的唯一转换操作是转换为小写。The only transformation performed on incomplete query terms is lowercasing.

后续步骤Next steps

请尝试在代码中指定 Lucene 查询分析器。Try specifying the Lucene Query Parser in your code. 以下链接介绍如何为 .NET 和 REST API 设置搜索查询。The following links explain how to set up search queries for both .NET and the REST API. 链接使用默认的简单语法,因此需要应用从本文中所学知识指定 queryTypeThe links use the default simple syntax so you will need to apply what you learned from this article to specify the queryType.

可在以下链接找到其他语法参考、查询体系结构和示例:Additional syntax reference, query architecture, and examples can be found in the following links: