Noise Words and the CONTAINS Predicate

When creating search queries, remember that words that are very common or carry no meaning about the content are removed when the content is indexed. These "noise" words cannot be matched in full-text searches. For example, searching for the phrase "this is a test" is equivalent to searching for the word "test," because "this," "is," and "a" are all discarded when the documents are indexed.

Note For information on updating noise word files, see KB 837847: How to customize SharePoint Portal Server 2003 by using IFilters, noise word files, and thesaurus files.

When noise words are discarded from CONTAINS content search terms, they are treated as placeholders. The phrase being searched for is expected to have the same number of words, but the noise words match any other single word. This can have unexpected results when the noise words are intended by the user as logical operators. For example, a user who wants to search for all documents that contain both "computer" and "software" might type "computer AND software". If the string is inserted into the CONTAINS predicate unchanged, it would be submitted as:

CONTAINS('"computer AND software"')

The Microsoft SharePoint Portal Server Search (SharePointPSSearch) engine recognizes "AND" as a noise word, and discards it. It then matches all documents in which "computer" and "software" are separated by other noise words. SharePointPSSearch would return documents containing "computer programming software", "computer drawing software", and even "computer running software". However, documents that contained simply "computer software" would not be returned.

The following CONTAINS predicate would return documents more closely matching the intent of the user:

CONTAINS('"computer" AND "software"')

FREETEXT Predicate

WHERE Clause