Search for Words Close to Another Word with NEAR

THIS TOPIC APPLIES TO: yesSQL ServeryesAzure SQL DatabasenoAzure SQL Data Warehouse noParallel Data Warehouse

You can use the proximity term NEAR in a CONTAINS predicate or CONTAINSTABLE function to search for words or phrases near one another.

Overview of NEAR

NEAR has the following features:

  • You can specify the maximum number of non-search terms that separate the first and last search terms.

  • You can search for words or phrases in any order, or you can search for words and phrases in a specific order.

  • You can specify the maximum number of non-search terms, or maximum distance, that separates the first and last search terms in order to constitute a match.

  • If you specify the maximum number of terms, you can also specify that matches must contain the search terms in the specified order.

To qualify as a match, a string of text must:

  • Start with one of the specified search terms and end with the one of the other specified search terms.

  • Contain all of the specified search terms.

  • The number of non-search terms, including stopwords, that occur between the first and last search terms must be less than or equal to the maximum distance, if the maximum distance is specified.

Syntax of NEAR

The basic syntax of NEAR is:

 NEAR (  

 {  

 *search_term* [ ,…*n* ]  

 |  

 (*search_term* [ ,…*n* ] ) [, <maximum_distance> [, <match_order> ] ]  

 }  

 )  

For more info about the syntax, see CONTAINS (Transact-SQL).

Examples

Example 1

For example, you could search for 'John' within two terms of 'Smith', as follows:

... CONTAINS(column_name, 'NEAR((John, Smith), 2)')

Some examples of strings that match are "John Jacob Smith" and "Smith, John". The string "John Jones knows Fred Smith" contains three intervening non-search terms, so it is not a match.

To require that the terms be found in the specified order, you would change the example proximity term to NEAR((John, Smith),2, TRUE). This searches for "John" within two terms of "Smith" but only when "John" precedes "Smith". In a language that reads from left to right, such as English, an example of a string that matches is "John Jacob Smith".

Note that for a language that reads from right to left, such as Arabic or Hebrew, the Full-Text Engine applies the specified terms in reverse order. Also, Object Explorer in SQL Server Management Studio automatically reverses the display order of words specified in right-to-left languages.

Example 2

The following example searches the Production.Document table of the AdventureWorks sample database for all document summaries that contain the word "reflector" in the same document as the word "bracket".

SELECT DocumentNode, Title, DocumentSummary  
FROM Production.Document AS DocTable   
INNER JOIN CONTAINSTABLE(Production.Document, Document,  
  'NEAR(bracket, reflector)' ) AS KEY_TBL  
  ON DocTable.DocumentNode = KEY_TBL.[KEY]  
WHERE KEY_TBL.RANK > 50  
ORDER BY KEY_TBL.RANK DESC;  
GO  

How maximum distance is measured

A specific maximum distance, such as 10 or 25, determines how many non-search terms, including stopwords, can occur between the first and last search terms in a given string. For example, NEAR((dogs, cats, "hunting mice"), 3) would return the following row, in which the total number of non-search terms is three ("enjoy", "but", and "avoid"):

"Cats enjoy hunting mice``, but avoid dogs``."

The same proximity term would not return the following row, because the maximum distance is exceeded by the four non-search terms ("enjoy", "but", "usually", and "avoid"):

"Cats enjoy hunting mice``, but usually avoid dogs``."

Combine NEAR with other terms

You can combine NEAR with some other terms. You can use AND (&), OR (|), or AND NOT (&!) to combine a custom proximity term with another custom proximity term, a simple term, or a prefix term. For example:

  • CONTAINS('NEAR((term1,term2),5) AND term3')

  • CONTAINS('NEAR((term1,term2),5) OR term3')

  • CONTAINS('NEAR((term1,term2),5) AND NOT term3')

  • CONTAINS('NEAR((term1,term2),5) AND NEAR((term3,term4),2)')

  • CONTAINS('NEAR((term1,term2),5) OR NEAR((term3,term4),2, TRUE)')

    For example,

CONTAINS(column_name, 'NEAR((term1, term2), 5, TRUE) AND term3')  

You can't combine NEAR with a generation term (ISABOUT …) or a weighted term (FORMSOF …).

More info about proximity searches

  • Overlapping occurrences of search terms

    All proximity searches always look for only non-overlapping occurrences. Overlapping occurrences of search terms never qualify as matches. For example, consider the following proximity term, which searches "A" and "AA" in this order with a maximum distance of two terms:

    CONTAINS(column_name, 'NEAR((A,AA),2, TRUE')  
    

    The possible matches are as "AAA", "A.AA", and "A..AA". Rows containing just "AA" would not match.

    Note

    You can specify terms that overlap, for example, NEAR("mountain bike", "bike trails") or (NEAR(comfort*, comfortable), 5). Specifying a overlapping terms increases the complexity of the query by increasing the possible match permutations. If you specify a large number of such overlapping terms, the query can run out of resources and fail. If this occurs, simplify the query and try again.

  • NEAR (regardless of whether a maximum distance is specified) indicates the logical distance between terms, rather than the absolute distance between them. For example, terms within different phrases or sentences within a paragraph are treated as farther apart than terms in the same phrase or sentence, regardless of their actual proximity, on the assumption that they are less related. Likewise, terms in different paragraphs are treated as being even farther apart. If a match spans the end of a sentence, paragraph, or chapter, the gap used for ranking a document is increased by 8, 128, or 1024, respectively.

  • Impact of proximity terms on ranking by the CONTAINSTABLE function

    When NEAR is used in the CONTAINSTABLE function, the number of hits in a document relative to its length as well as the distance between the first and last search terms in each of the hits affects the ranking of each document. For a generic proximity term, if the matched search terms are >50 logical terms apart, the rank returned on a document is 0. For a custom proximity term that does not specify an integer as the maximum distance, a document that contains only hits whose gap is >100 logical terms will receive a ranking of 0. For more information about ranking of custom proximity searches, see Limit Search Results with RANK.

  • The transform noise words server option

    The value of transform noise words impacts how SQL Server treats stopwords if they are specified in proximity searches. For more information, see transform noise words Server Configuration Option.

See Also

CONTAINS (Transact-SQL)
CONTAINSTABLE (Transact-SQL)
Query with Full-Text Search