Windows SharePoint Services Search Architecture

Article
05/05/2014

This content is outdated and is no longer being maintained. It is provided as a courtesy for individuals who are still using these technologies. This page may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

Windows SharePoint Services 3.0 uses the same SharePoint search technology used by Enterprise Search in Microsoft Office SharePoint Server 2007 rather than relying on Microsoft SQL Server full-text searching, as previous versions of Microsoft Windows SharePoint Services did.

Search in Windows SharePoint Services addresses a single site collection, and is automatically scoped to current context and limited to site and subsites, list or library, or folder. If you are looking at a subsite, you cannot search over the entire site collection, but you can search over all subsites of the current site.

Only SharePoint content in the site collection can be crawled. You cannot configure Search to crawl databases, mail servers, application servers, or Web sites and file shares outside of the site collection. In a deployment with more than one site collection, each site collection provides Search only for content on that site collection, and there is no aggregation of search results across site collections.

The Windows SharePoint Services 3.0 Search service is also used to enable searching of the Help system that is built into Microsoft Office SharePoint Server 2007.

Content Crawling

Most of the capabilities for Search are configured automatically during installation.

One content source is automatically created for all user content Web applications. No administration details are exposed to site administrators. When a new site is created, the site's URL is added to the start addresses for the content source.

One content source is automatically created for the Central Administration Web application.

Full crawls occur as specified in the Administrator-controlled crawl schedule on the Central Administration configuration page.

The index engine uses a pipe of shared memory to request that the Filter Daemon begin filtering the content source. The Filter Daemon uses the Windows SharePoint Services 3.0 protocol handler and appropriate IFilters to extract and filter individual items from the site. Appropriate IFilters for each document are applied, and the Filter Daemon passes the extracted text and metadata to the index engine through the pipe.

At this point in the content crawling process, the index engine saves document properties to a property store that is separate from the content index. The property store consists of a table of properties and their values. Properties in this store can be retrieved and sorted. In addition, simple queries against properties are supported by the store. Each row in the table corresponds to a separate document in the full-text index. The actual text of a content item is stored in the content index, so it can be used for content queries. The property store also maintains and enforces document-level security that is gathered when a document is crawled.

At this point, the index engine uses wordbreakers to further process the text and properties picked up during the crawl. The wordbreaker component is used to break the text into words and phrases. The index engine also removes noise words and creates an inverted index for full-text searching.

Search Query Execution

When a search query is executed, the Query engine passes the query through a language-specific word-breaker. If there is no word-breaker for the query language, the neutral word-breaker is used, which performs white space-style word-breaking that breaks words and phrases where there are white spaces.

After word-breaking, the resulting words are passed through a stemmer to generate language-specific inflected forms of a given word. The use of word-breaker and stemmer in both the crawling and query processes enhances the effectiveness of search because more relevant alternatives to a user's query phrasing are generated. When the Query engine executes a property value query, the index is checked first to get a list of possible matches.

The properties for the matching documents are loaded from the property store, and the properties in the query are checked again to ensure that there was a match. The result of the query is a list of all matching results, ordered according to their relevance to the query words. If the user does not have permission to view a matching document, the Query engine filters that document out of the list that is returned.

Customizing Windows SharePoint Services Search

Query Object Model

Windows SharePoint Services 3.0 includes a Microsoft.SharePoint.Search.Query object model that you can use in custom search Web Parts and search applications to execute queries against the Search in Windows SharePoint Services service. For more information, see Windows SharePoint Services Search Query Object Model.

Query Web Service

Windows SharePoint Services 3.0 exposes its search functionalities through a Web service. This allows you to access Search results from client applications and Web applications outside of the context of the SharePoint site.

To access the SearchQuery Web service and its methods, set a Web reference to the following:

http://Server_Name/[Site_Name/]_vti_bin/spsearch.asmx

For more information, see Windows SharePoint Services Query Web Service.

Query Syntax

Search in Windows SharePoint Services supports three types of search syntax for building search queries:

Windows SharePoint Services Search Keyword Syntax (search terms are passed directly to the Search service)
Windows SharePoint Services Search SQL Syntax (extension of SQL syntax for querying databases)
Windows SharePoint Services Search URL Syntax (search parameters are encoded in the URL, and posted directly to the search page)