Use the "full" Lucene search syntax (advanced queries in Azure Cognitive Search)

When constructing queries for Azure Cognitive Search, you can replace the default simple query parser with the more powerful Lucene query parser to formulate specialized and advanced query expressions.

The Lucene parser supports complex query formats, such as field-scoped queries, fuzzy search, infix and suffix wildcard search, proximity search, term boosting, and regular expression search. The additional power comes with additional processing requirements so you should expect a slightly longer execution time. In this article, you can step through examples demonstrating query operations based on full syntax.

Note

Many of the specialized query constructions enabled through the full Lucene query syntax are not text-analyzed, which can be surprising if you expect stemming or lemmatization. Lexical analysis is only performed on complete terms (a term query or phrase query). Query types with incomplete terms (prefix query, wildcard query, regex query, fuzzy query) are added directly to the query tree, bypassing the analysis stage. The only transformation performed on partial query terms is lowercasing.

Hotels sample index

The following queries are based on the hotels-sample-index, which you can create by following the instructions in this quickstart.

Example queries are articulated using the REST API and POST requests. You can paste and run them in Postman or in Visual Studio Code with the Cognitive Search extension.

Request headers must have the following values:

Key Value
Content-Type application/json
api-key <your-search-service-api-key>, either query or admin key

URI parameters must include your search service endpoint with the index name, docs collections, search command, and API version, similar to the following example:

https://{{service-name}}.search.windows.net/indexes/hotels-sample-index/docs/search?api-version=2020-06-30

Request body should be formed as valid JSON:

{
    "search": "*",
    "queryType": "full",
    "select": "HotelId, HotelName, Category, Tags, Description",
    "count": true
}
  • "search" set to * is an unspecified query, equivalent to null or empty search. It's not especially useful, but it is the simplest search you can do, and it shows all retrievable fields in the index, with all values.

  • "queryType" set to "full" invokes the full Lucene query parser and it's required for this syntax.

  • "select" set to a comma-delimited list of fields is used for search result composition, including just those fields that are useful in the context of search results.

  • "count" returns the number of documents matching the search criteria. On an empty search string, the count will be all documents in the index (50 in the case of hotels-sample-index).

Fielded search scope individual, embedded search expressions to a specific field. This example searches for hotel names with the term "hotel" in them, but not "motel". You can specify multiple fields using AND.

When you use this query syntax, you can omit the "searchFields" parameter when the fields you want to query are in the search expression itself. If you include "searchFields" with fielded search, the fieldName:searchExpression always takes precedence over "searchFields".

POST /indexes/hotel-samples-index/docs/search?api-version=2020-06-30
{
    "search": "HotelName:(hotel NOT motel) AND Category:'Resort and Spa'",
    "queryType": "full",
    "select": "HotelName, Category",
    "count": true
}

Response for this query should look similar to the following example, filtered on "Resort and Spa", returning hotels that include "hotel" or "motel" in the name.

"@odata.count": 4,
"value": [
    {
        "@search.score": 4.481559,
        "HotelName": "Nova Hotel & Spa",
        "Category": "Resort and Spa"
    },
    {
        "@search.score": 2.4524608,
        "HotelName": "King's Palace Hotel",
        "Category": "Resort and Spa"
    },
    {
        "@search.score": 2.3970203,
        "HotelName": "Triple Landscape Hotel",
        "Category": "Resort and Spa"
    },
    {
        "@search.score": 2.2953436,
        "HotelName": "Peaceful Market Hotel & Spa",
        "Category": "Resort and Spa"
    }
]

The search expression can be a single term or a phrase, or a more complex expression in parentheses, optionally with Boolean operators. Some examples include the following:

  • HotelName:(hotel NOT motel)
  • Address/StateProvince:("WA" OR "CA")
  • Tags:("free wifi" NOT "free parking") AND "coffee in lobby"

Be sure to put a phrase within quotation marks if you want both strings to be evaluated as a single entity, as in this case searching for two distinct locations in the Address/StateProvince field. Depending on the client, you might need to escape (\) the quotation marks.

The field specified in fieldName:searchExpression must be a searchable field. See Create Index (REST API) for details on how field definitions are attributed.

Fuzzy search matches on terms that are similar, including misspelled words. To do a fuzzy search, append the tilde ~ symbol at the end of a single word with an optional parameter, a value between 0 and 2, that specifies the edit distance. For example, blue~ or blue~1 would return blue, blues, and glue.

POST /indexes/hotel-samples-index/docs/search?api-version=2020-06-30
{
    "search": "Tags:conserge~",
    "queryType": "full",
    "select": "HotelName, Category, Tags",
    "searchFields": "HotelName, Category, Tags",
    "count": true
}

Response for this query resolves to "concierge" in the matching documents, trimmed for brevity:

"@odata.count": 12,
"value": [
    {
        "@search.score": 1.1832147,
        "HotelName": "Secret Point Motel",
        "Category": "Boutique",
        "Tags": [
            "pool",
            "air conditioning",
            "concierge"
        ]
    },
    {
        "@search.score": 1.1819803,
        "HotelName": "Twin Dome Motel",
        "Category": "Boutique",
        "Tags": [
            "pool",
            "free wifi",
            "concierge"
        ]
    },
    {
        "@search.score": 1.1773309,
        "HotelName": "Smile Hotel",
        "Category": "Suite",
        "Tags": [
            "view",
            "concierge",
            "laundry service"
        ]
    },

Phrases aren't supported directly but you can specify a fuzzy match on each term of a multi-part phrase, such as search=Tags:landy~ AND sevic~. This query expression finds 15 matches on "laundry service".

Note

Fuzzy queries are not analyzed. Query types with incomplete terms (prefix query, wildcard query, regex query, fuzzy query) are added directly to the query tree, bypassing the analysis stage. The only transformation performed on partial query terms is lower casing.

Proximity search finds terms that are near each other in a document. Insert a tilde "~" symbol at the end of a phrase followed by the number of words that create the proximity boundary.

This query searches for the terms "hotel" and "airport" within 5 words of each other in a document. The quotation marks are escaped (\") to preserve the phrase:

POST /indexes/hotel-samples-index/docs/search?api-version=2020-06-30
{
    "search": "Description: \"hotel airport\"~5",
    "queryType": "full",
    "select": "HotelName, Description",
    "searchFields": "HotelName, Description",
    "count": true
}

Response for this query should look similar to the following example:

"@odata.count": 2,
"value": [
    {
        "@search.score": 0.6331726,
        "HotelName": "Trails End Motel",
        "Description": "Only 8 miles from Downtown.  On-site bar/restaurant, Free hot breakfast buffet, Free wireless internet, All non-smoking hotel. Only 15 miles from airport."
    },
    {
        "@search.score": 0.43032226,
        "HotelName": "Catfish Creek Fishing Cabins",
        "Description": "Brand new mattresses and pillows.  Free airport shuttle. Great hotel for your business needs. Comp WIFI, atrium lounge & restaurant, 1 mile from light rail."
    }
]

Example 4: Term boosting

Term boosting refers to ranking a document higher if it contains the boosted term, relative to documents that do not contain the term. To boost a term, use the caret, ^, symbol with a boost factor (a number) at the end of the term you are searching. The boost factor default is 1, and although it must be positive, it can be less than 1 (for example, 0.2). Term boosting differs from scoring profiles in that scoring profiles boost certain fields, rather than specific terms.

In this "before" query, search for "beach access" and notice that there are seven documents that match on one or both terms.

POST /indexes/hotel-samples-index/docs/search?api-version=2020-06-30
{
    "search": "beach access",
    "queryType": "full",
    "select": "HotelName, Description, Tags",
    "searchFields": "HotelName, Description, Tags",
    "count": true
}

In fact, there is only one document that matches on "access", and because it is the only match, it's placement is high (second position) even though the document is missing the term "beach".

"@odata.count": 7,
"value": [
    {
        "@search.score": 2.2723424,
        "HotelName": "Nova Hotel & Spa",
        "Description": "1 Mile from the airport.  Free WiFi, Outdoor Pool, Complimentary Airport Shuttle, 6 miles from the beach & 10 miles from downtown."
    },
    {
        "@search.score": 1.5507699,
        "HotelName": "Old Carrabelle Hotel",
        "Description": "Spacious rooms, glamorous suites and residences, rooftop pool, walking access to shopping, dining, entertainment and the city center."
    },
    {
        "@search.score": 1.5358944,
        "HotelName": "Whitefish Lodge & Suites",
        "Description": "Located on in the heart of the forest. Enjoy Warm Weather, Beach Club Services, Natural Hot Springs, Airport Shuttle."
    },
    {
        "@search.score": 1.3433652,
        "HotelName": "Ocean Air Motel",
        "Description": "Oceanfront hotel overlooking the beach features rooms with a private balcony and 2 indoor and outdoor pools. Various shops and art entertainment are on the boardwalk, just steps away."
    },

In the "after" query, repeat the search, this time boosting results with the term "beach" over the term "access". A human readable version of the query is search=Description:beach^2 access. Depending on your client, you might need to express ^2 as %5E2.

After boosting the term "beach", the match on Old Carrabelle Hotel moves down to sixth place.

Example 5: Regex

A regular expression search finds a match based on the contents between forward slashes "/", as documented in the RegExp class.

POST /indexes/hotel-samples-index/docs/search?api-version=2020-06-30
{
    "search": "HotelName:/(Mo|Ho)tel/",
    "queryType": "full",
    "select": "HotelName",
    "count": true
}

Response for this query should look similar to the following example:

    "@odata.count": 22,
    "value": [
        {
            "@search.score": 1.0,
            "HotelName": "Days Hotel"
        },
        {
            "@search.score": 1.0,
            "HotelName": "Triple Landscape Hotel"
        },
        {
            "@search.score": 1.0,
            "HotelName": "Smile Hotel"
        },
        {
            "@search.score": 1.0,
            "HotelName": "Pelham Hotel"
        },
        {
            "@search.score": 1.0,
            "HotelName": "Sublime Cliff Hotel"
        },
        {
            "@search.score": 1.0,
            "HotelName": "Twin Dome Motel"
        },
        {
            "@search.score": 1.0,
            "HotelName": "Nova Hotel & Spa"
        },
        {
            "@search.score": 1.0,
            "HotelName": "Scarlet Harbor Hotel"
        },

Note

Regex queries are not analyzed. The only transformation performed on partial query terms is lower casing.

You can use generally recognized syntax for multiple (*) or single (?) character wildcard searches. Note the Lucene query parser supports the use of these symbols with a single term, and not a phrase.

In this query, search for hotel names that contain the prefix 'sc'. You cannot use a * or ? symbol as the first character of a search.

POST /indexes/hotel-samples-index/docs/search?api-version=2020-06-30
{
    "search": "HotelName:sc*",
    "queryType": "full",
    "select": "HotelName",
    "count": true
}

Response for this query should look similar to the following example:

    "@odata.count": 2,
    "value": [
        {
            "@search.score": 1.0,
            "HotelName": "Scarlet Harbor Hotel"
        },
        {
            "@search.score": 1.0,
            "HotelName": "Scottish Inn"
        }
    ]

Note

Wildcard queries are not analyzed. The only transformation performed on partial query terms is lower casing.

Next steps

Try specifying queries in code. The following links explain how to set up search queries using the Azure SDKs.

Additional syntax reference, query architecture, and examples can be found in the following links: