Document and System Locale Settings

When the operating system, or even an application, is set to use a particular language and locale, many settings are affected. These settings include numeric format, date format, currency format, uppercase and lowercase mapping, dictionary sort ordering, tokenization, and others. Although these settings help Microsoft Windows and Microsoft SharePoint Portal Server Search (SharePointPSSearch) provide excellent localized support, unexpected results can occur when documents from one locale are searched by using a system set to another locale.

For example, the list of "noise" words (words discarded during indexing and from queries because they give no meaning or context) in each language is very different. In German, the word "die" is equivalent to the English word "the". If you index a German document and then search for "die" by using an English query system, documents might be returned, even though the word should be ignored. A German system given the same query would return an error stating that the search query contained only noise words.

When the IFilter object processes a document's text properties and content, it reports the language of that document to the content indexer. By using this information, SharePointPSSearch can apply the appropriate word breaker and noise words list.