Microsoft Full-Text Search Technologies

Archived content. No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

Published: June 1, 2001

On This Page

Introduction
Microsoft SharePoint Portal Server
Microsoft Indexing Service
Microsoft SQL Server 2000
Microsoft Site Server
Microsoft Exchange 2000 Server
Microsoft Office XP Search
Conclusion
Appendix A – Comparison Tables

Introduction

This white paper introduces the basic concept of full-text search and explains how different Microsoft® products implement full-text search. This information can help you to determine which Microsoft products are best for your information retrieval needs.

Microsoft full-text search technology has contributed to several server and client products. Offerings vary depending on the requirements of each product. Differences also reflect the evolution of the technology. However, all products benefit from the common advantage of efficient retrieval of unstructured, textual data by means of a full-text index.

The Microsoft products listed below use variants of Microsoft full-text search technology:

  • Index Server, Indexing Service for Microsoft Windows®

  • Microsoft SharePoint™ Portal Server 2001

  • Microsoft SQL Server™ 7.0 and SQL Server 2000

  • Microsoft Exchange Server 2000

  • Microsoft Site Server 3.0

  • Microsoft Office XP

The product you choose depends on what you want to do. For example, you may want to search intranet sites or Internet sites, Exchange public folders, or you may want to search over structured or unstructured data. You may need to cater to an internal team, or you may need to serve the needs of customers over your extranet site. These and other considerations will help you determine which product is best for you.

The task of full-text search is to provide relevant information from a collection of sources in response to a user's need. This need is typically expressed as a textual query that looks for each (or any) of the query terms in each of the documents in the collection. A simple approach opens and scans each document when a query is processed, looking for each of the query terms. However, opening every document at query processing time and searching for the query terms can be very time consuming. This approach does not scale beyond the individual user searching over a small number of documents.

The simple solution is to do much of the work ahead of time. This is done by extracting information about the terms in each document and storing the information in a way that is easy to retrieve. When a query is processed, there is no need to scan each document. The only needs are to compare the documents to each other using the inverted index and to choose the documents that are most relevant to the query.

The principle of doing much of the work ahead of query time is the foundation of all full-text search technologies, including Microsoft full-text search. To be effective, a search technology must:

  • Get documents from various document stores.

  • Extract text from various document formats.

  • Update the index with the document terms.

  • Rank the documents, bringing the most relevant documents to the top of a list.

Good search technology performs these tasks over documents in various languages, over many different types of formats, and across documents stored in a variety of document repositories. Good search technology returns those documents that are truly relevant to a user's need. At its best, full-text search technology fits into a complete knowledge solution, where direct textual query is the user's last resort. The information the user needs is interpreted using advanced mechanisms and is answered with a combination of structured and unstructured information.

The following components in the Microsoft full-text search technology provide for an excellent full-text search solution:

  • Protocol handlers. A protocol handler can access data over a particular protocol or from a particular store. Common protocol handlers include the file protocol, Hypertext Transfer Protocol (HTTP), Messaging Application Programming Interface (MAPI), and HTTP Distributed Authoring and Versioning (HTTPDAV). The protocol handler processes URLs passed to it by the gatherer.

  • Gatherer. The gatherer maintains the queue of URLs to be accessed across protocols. For example, a Web site crawl may include hundreds of pages and create network traffic by accessing each Web page one at a time. To be more efficient, the gatherer interleaves URLs from a remote Web location with URLs from other Web locations, or with access to file system documents or other stores. The gatherer may use additional logic to improve crawl efficiency, such as SharePoint Portal Server adaptive crawling. It balances the load that the gathering process imposes on crawled servers. The gatherer maintains the queue of URLs to be processed and manages the combined crawl. For each document accessed, the gatherer fetches the stream of content from the protocol handler and passes it on to the appropriate filter.

  • Filters. Filters (also known as IFilters) extract textual information from a specific document format, such as Microsoft Word documents or text files. For example, Microsoft provides the Microsoft Office filter, which can extract terms from Word, Microsoft Excel, and Microsoft PowerPoint® files. Other filters work with HTML or e-mail messages. There are also third-party filters, such as the PDF filter provided by Adobe.

    The filter's task is to extract a stream of textual information from a document, discarding all non-textual and formatting information. The filter produces strings of text and property/value pairs, which are passed in turn to the indexing engine. All filters are written to an application programming interface (API), which is documented as part of the Microsoft Platform Software Development Kit (SDK). For more information, see "Using Custom Filters with Indexing Service" at https://www.microsoft.com/msdownload/platformsdk/sdkupdate/.

  • Word breakers and stemmers. A word breaker is a component that determines where the word boundaries are in the stream of characters in the query or in the document being crawled. A stemmer extracts the root form of a given word. For example, "running," "ran," and "runner" are variants of the word "run." In some languages, a stemmer expands the root form of a word to alternate forms.

    SharePoint Portal Server provides word breakers for English, French, Spanish, Japanese, Thai, Korean, Chinese Traditional, and Chinese Simplified. The Windows 2000 Server Indexing Service word breakers are used for Dutch, Italian, Swedish, and German. When SharePoint Portal Server crawls documents that are in multiple languages, the customized word breaker for each language enables the resulting terms to be more accurate for that language. In the case where there is a word breaker for the language family, but not for the specific sub-language, the major language is used. For example, the French word breaker is used to handle text that is French Canadian. If no word breaker is available for a particular language, the neutral word breaker is used. Words are broken at neutral characters such as spaces and punctuation marks. The code for determining where words are broken is built into the Microsoft Search (MSSearch) service and cannot be changed.

  • Indexing engine. The function of the indexing engine is to prepare an inverted index of content. An inverse index is a data structure with a row for each term. In this row, there is information about the documents in which the term appears and the number of occurrences and relative position of the term within each document. The inverse index provides the ability to apply statistic and probabilistic formulas to quickly compute the relevance of documents.

    Applications, such as Windows or Microsoft Outlook®, that do not have full-text search enabled access each document at query time. These applications traverse each document and use a filter or other outdated technology to find query terms. This process is very slow compared to an inverse index. The inverse index provides the ability to go directly into a ranking formula instead of going to sources.

  • Ranking. Ultimately, the task of evaluating a query results in a set of relevant documents. In relational databases, each row is either in the result set or not. For example, when a user queries for "all accounts whose balance is lower than or equal to $30,000," it is straightforward to tell which rows in the accounts table should be returned. The task of full-text search, by contrast, is subtler. The queries are imperfect representations of an information need, and the documents retrieved vary in their relevance. The most relevant documents are ranked at the top of the result set, but less relevant documents are still valuable to the user and are ranked further below.

    Microsoft full-text search products differ in the algorithm used for this ranking. Index Server and Site Server 3.0 use vector-based ranking algorithms, while later products employ an advanced probabilistic algorithm.

Query Languages

To express the information request to the system, the user depends on a language that describes the restrictions and conditions over the terms. For example, a user may be interested in all documents published last week. To query for this, the user must express both the concept of "publishing" a document and the precise time range (starting on the previous Monday and ending on the previous Sunday, for example).

Microsoft full-text products evolved through three different query languages:

  • Query Dialect 1

  • Structured Query Language (SQL) full-text extensions

  • Query Dialect 2

For detailed information about the query dialects, see the Platform SDK at https://www.microsoft.com/msdownload/platformsdk/sdkupdate/.

The following sections discuss Microsoft products that incorporate Microsoft full-text search technology. Each section includes an overview of the product, its target user, and the way in which full-text search is integrated with the product.

Microsoft SharePoint Portal Server

Overview

SharePoint Portal Server is the flexible portal solution that lets you find, share, and publish information easily. With SharePoint Portal Server, you can use existing information effectively and capture information in new ways that are appropriate for your business. In addition, you can rapidly deploy an out-of-the-box dashboard site and easily use Web Parts technology to customize a Web-based view of your organization.

For more information about SharePoint Portal Server, see https://www.microsoft.com/technet/prodtechnol/sppt/sharepoint/default.mspx.

Target

SharePoint Portal Server is targeted at intranet portal solutions, starting with the team portal and ending at the enterprise portal.

Search Features

SharePoint Portal Server presents the most current and richest set of search and information discovery features.

  • Data access. SharePoint Portal Server uses protocol handlers and the gatherer to crawl and provide search over data from diverse content sources. Out of the box, SharePoint Portal Server can crawl documents from:

    • File systems

    • Web sites

    • Exchange 2000 Server and Exchange Server 5.5 computers

    • Lotus Notes servers

    • Other SharePoint Portal Server workspaces

    Although it does not provide direct access to OLE DB, Open Database Connectivity (ODBC), or other relational data access standards, SharePoint Portal Server can crawl information from databases by using HTTP. To do this, you must create an Active Server Pages (ASP) page that renders information from each row in the database.

    The Microsoft SharePoint Portal Server SDK documents the protocol handler interface. This interface enables developers to write a protocol handler for document repositories with other, proprietary, data access methods, such as document management systems or archiving solutions. The Resource Kit for SharePoint Portal Server includes protocol handlers that can be used to crawl File Transfer Protocol (FTP) sites and SharePoint Team Services sites.

  • Filters. SharePoint Portal Server includes filters for Microsoft Office documents, HTML files, Tagged Image File Format (TIFF) files, and text files. The TIFF filter enables SharePoint Portal Server to crawl the textual content of saved fax data based on Optical Character Recognition (OCR) technology. SharePoint Portal Server uses the Multipurpose Internet Mail Extensions (MIME) filter that ships with Windows 2000 when filtering messages from Exchange public folders. SharePoint Portal Server also supports third-party and custom file types, such as the Adobe PDF filter. For more information about the PDF filter, see the Adobe Web site.

  • Ranking. SharePoint Portal Server offers an advanced probabilistic ranking algorithm, based on information retrieval achievements in Microsoft Research. This algorithm guarantees that the documents most relevant to a user's query are returned at the top of the list of search results, providing increased user efficiency and satisfaction.

    The ranking formula was developed by Microsoft Researcher and City University Professor Stephen Robertson, winner of the prestigious Association for Computing Machinery Special Interest Group on Information Retrieval (ACM SIGIR) 2000 Salton Award. The ranking formula adopted and used by Microsoft full-text search is a direct result of this research. In computing the likely relevance of a document, the formula uses the following factors:

    • The length of the document

    • The frequency of the query term in the entire collection of documents

    • The number of documents containing the query term

    • The number of documents in the entire collection of documents

  • Best Bets. This feature enables users with appropriate permissions to tag individual documents as most appropriate for specific queries or categories. Even in the most advanced probabilistic ranking environment, certain documents lack the textual information to be prominent in search results for particular terms. The Best Bets feature addresses this problem most effectively, either by advancing the specially tagged documents to the top of the results list or by displaying them prominently when browsing categories. The SharePoint Portal Server out-of-the-box dashboard site also nominates Best Bet documents when the rank of the document is very high.

  • Automatic categorization. In addition to simple search, SharePoint Portal Server provides automatic categorization. This feature enables the user to define a category hierarchy and then use a sample set of documents within the hierarchy as a training sample. After training, documents stored on the server and crawled documents are automatically tagged and appear in the category hierarchy.

  • Schema support. SharePoint Portal server provides simplified schema management facilities that are compatible with Office through the use of promotion and demotion. Users define document profiles and associated properties. During promotion, property values in the Office document are copied to the properties of a SharePoint Portal Server document profile. During demotion, property values found in a SharePoint Portal Server document profile are copied to the Office document. Full-text search in SharePoint Portal Server is tightly integrated with that schema. Advanced search uses properties and document profiles.

  • Extensibility and programmability. The SharePoint Portal Server dashboard site is based on Microsoft Digital Dashboard technology. Microsoft Digital Dashboard technology enables easy integration of business application and custom content with the built-in search features of SharePoint Portal Server. Query submission and search results are both provided as Web Parts, which can easily coexist on the dashboard site with custom Web Parts. However, the Web Parts for query submission and search results rely on each other for functionality and therefore must reside on a SharePoint Portal Server computer. The SharePoint Portal Server SDK supports the development of custom search solutions by documenting the search APIs. You can manipulate search by using either ActiveX® Data Objects (ADO), OLE DB, or the Web-based Distributed Authoring and Versioning (WebDAV) protocol. SharePoint Portal Server does not provide automation interfaces for management of its search, document management, or dashboard site features.

  • Query languages. SharePoint Portal Server uses SQL full-text extensions. Queries are submitted using Distributed Authoring and Versioning Searching and Locating (DASL) requests, part of HTTPDAV. For more information, see the SharePoint Portal Server SDK.

  • Subscriptions. The SharePoint Portal Server subscriptions feature enables users to subscribe to changes in documents, folders, categories, and search results. Subscriptions are maintained as persistent queries. Notifications are sent to the subscriber whenever a change occurs. To add subscriptions programmatically, see the SharePoint Portal Server SDK. Subscriptions are implemented by using Persistent Query Service (PQS) rules. PQS is a reverse query processor. It evaluates a large set of queries against a single document to determine which queries match the document. This allows matching subscriptions to be identified as each new document arrives in the SharePoint Portal Server store. Subscriptions provide the "push" model to match the "pull" model of full-text search.

  • Adaptive crawling. Site Server 3.0 introduced incremental crawling, which uses time stamp comparisons to include only documents that have changed since the previous update of the index. Incremental updates reduce the amount of indexing work involved in repeated crawls. However, incremental updates do not eliminate the need to inspect the time stamp of each document previously crawled each time a crawl occurs. Adaptive crawling goes one step further. During crawls, the algorithm for adaptive crawling gathers statistics about the rate of change for each document. In subsequent adaptive crawls, the algorithm targets only documents that are likely to have changed.

SharePoint Portal Server does not replace all of the functionality of Site Server, but the search technology used in SharePoint Portal Server is more recent than the search technology used in Site Server. In addition, SharePoint Portal Server uses an advanced ranking algorithm and has advanced features that allow you to search from the dashboard site out of the box. These advanced features include Best Bets, categories, and Office schema integration.

SharePoint Portal Server offers significantly enhanced indexing performance over Site Server 3.0 by providing a multi-threaded indexing engine. The introduction of adaptive crawling also significantly reduces the amount of time it takes to perform incremental indexing.

Microsoft Indexing Service

Overview

Indexing Service is a Microsoft Windows 2000 base service for file systems and Web servers. Formerly known as Index Server, its original function was to crawl and create a catalog of the content of Internet Information Services (IIS) Web servers. Indexing Service now creates catalogs for the contents and properties of both file systems and virtual Web sites.

Target

As an operating system component, Indexing Service targets the same wide range of customer scenarios that Windows targets. Indexing Service targets the user's desktop and provides an enhanced search experience for individual users over information stored on local disks. Indexing Service is exposed in Windows when you click the Search button in the Start menu, when you press CTRL+F, when you click the Search button in Windows Explorer, and when you click the search task pane in Office XP. Indexing Service exposes management and query objects that allow rapid development of custom search applications. Indexing Service catalogs can be expanded to contain information from remote file shares. Such custom applications can serve vertical applications or groups of users and can crawl information from multiple locations.

Indexing Service also offers full-text search from Internet sites. Indexing Service can be used to drive custom search Web applications. In addition to query language support, Indexing Service offers a full range of programmability features targeting the custom application developer: scripting objects for query and administration, an OLE DB provider, and ADO compatibility.

Search Features

  • Data access. Indexing Service does not include a cross-protocol gathering component. It can access any data that is available from the file system, including local file systems and shared file systems on remote computers. Indexing Service facilitates indexing of Web site content by using the IIS metabase to understand which files map to Web site content. Indexing Service then follows the information from the IIS metabase to crawl the local Web sites. Indexing Service does not use the HTTP protocol to crawl Web sites. Therefore, Indexing Service cannot crawl content that is rendered dynamically, such as ASP pages referencing a database or personalized content that changes for each user.

  • Filters. Indexing Service uses filters installed on the operating system, including the MIME filter for news and e-mail, the Office filter for Office documents, and the HTML filter.

  • Ranking. Indexing Service uses ranking algorithms based on the vector space model. Information about the specific algorithms is included in the Platform SDK. The default algorithm used is the Jaccard formula. For more information about Indexing Service's ranking formulae, see https://msdn.microsoft.com/library/en-us/indexsrv/html/ixqlang_92xx.asp.

  • Schema support. Indexing Service provides rich, broad schema support. Using Microsoft Management Console (MMC), users can view all properties indexed from documents and can indicate which properties to stored in the property cache for fast retrieval.

  • Extensibility and programmability. Indexing Service provides a platform for full-text search applications. It includes a full set of programming interfaces: scripting interfaces for administration and query, and an OLE DB provider for search. More information about Indexing Service programming interfaces is available in the Platform SDK.

  • Query languages. Indexing Service provides rapid access to files through flexible querying language. Indexing Service supports Query Dialect 1, Query Dialect 2, and SQL full-text extensions.

A list of features new to Indexing Service 3.0 (provided with Windows 2000) is available in the Platform SDK. For more information, see https://msdn.microsoft.com/library/default.asp?url=/library/en-us/indexsrv/hh/indexsrv/ixintro_24og.asp.

Indexing Service is the performance solution for custom application development to provide full-text search over content of an Internet site. It is less appropriate for applications where the data is primarily structured. Such applications should consider Microsoft SQL Server 2000. For out-of-box ease of use, or for applications that require aggregation of content from various sources and source types, SharePoint Portal Server is the appropriate choice.

Indexing Service is an optional operating system component. Initial indexing of file system contents can be resource intensive and can affect desktop application performance. Therefore, Indexing Service is not enabled by default.

Microsoft SQL Server 2000

Overview

SQL Server 2000 is a family of products that meets the data storage and analysis requirements of the largest data processing systems and commercial Web sites. SQL Server 2000 can provide easy-to-use data storage and analysis services to an individual or a small business.

For more information about SQL Server 2000, see https://www.microsoft.com/sql.

Target

Full-text search in SQL 2000 is aimed at search over data that is primarily structured, but also includes textual, unstructured information.

Search Features

SQL Server 2000 uses the same search engine technology used by SharePoint Portal Server, benefits from the same advanced ranking algorithm, and uses a subset of the full-text extensions to SQL used by SharePoint Portal Server.

  • Data access. Full-text search in SQL server can be used only over content stored in SQL columns.

  • Filters. SQL Server 2000 uses filters installed on the server to handle documents stored in database columns. Users use IMAGE type columns to store documents, and then specify a second column to indicate the document type. Full-text search then applies the appropriate filter, such as HTML, Office, or third-party filters, based on the document type. In addition, full-text search can be applied to the contents of columns of type [N]CHAR, [N]VARCHAR, and [N]TEXT.

  • Extensibility and programmability. Full-text search SQL extensions are integrated into the T-SQL language. Users can specify SQL queries that span structured data from SQL tables, unstructured data from SQL columns, from documents embedded in the columns, and from the file system.

For more information about the SQL Server full-text search feature, see https://msdn2.microsoft.com/en-us/library/ms345119.aspx.

Full-text search was introduced as a feature of SQL Server with SQL Server 7.0. For further information about full-text search in SQL 7.0, see the white paper titled "Textual Searches on Database Data Using SQL Server 7.0" at https://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql7/html/textsearch.asp. For information about combining file system and SQL table searches using SQL Server full-text search, see the white paper titled "Textual Searches on File Data Using Microsoft SQL Server 7.0" at https://msdn.microsoft.com/library/en-us/dnsql7/html/filedatats.asp.

Microsoft Site Server

Overview

Site Server is designed to help you get the most out of your corporate intranet. Site Server enables users to publish, find, and share information quickly and easily. Features include extensive search capabilities and tools to perform thorough analyses of your intranet's usage and effectiveness.

Site Server Commerce Edition is a comprehensive Internet commerce server that enables you to engage customers, transact business, and analyze commerce Web sites. Highly scalable and secure, Site Server Commerce Edition streamlines and integrates your online dealings with distributors and suppliers.

For more information, see https://www.microsoft.com/siteserver.

Target

Since the introduction of Site Server 3.0 Standard Edition and Site Server 3.0 Commerce Edition in May 1998, the Web marketplace has evolved rapidly. Site Server 3.0 Standard Edition was targeted to the intranet space, allowing users to find, share and publish information to their corporate intranets. In comparison, Site Server 3.0 Commerce Edition was targeted to the Internet space, with the ability to conduct a financial transaction online, analyze transactions, and conduct a personalized interaction with the consumer.

Since then, the intranet market needs have changed substantially and have evolved into the portal market, with greater need for core services and application integration as well as a continued requirement for robust enterprise-wide search. As a result, product focuses have shifted accordingly. The search technology of Site Server 3.0 Standard Edition is continued in SharePoint Portal Server. Site Server 3.0 Commerce Edition's e-commerce and Internet functionality is now best served by using Microsoft E-Commerce Business Solutions. For further information, see https://www.microsoft.com/business/default.mspx.

Search Features

  • Data access. Site Server introduced the concept of gathering and the concept of protocol handlers. Site Server can crawl Exchange Server 5.5 computers and Web sites. The gatherer can process both hierarchical (file system) and Web spaces (HTTP). Site Server does not support custom protocol handlers. The interface is not extensible to support new document stores.

    Site Server can crawl information from databases by using an ASP page that renders the information from rows in a database.

  • Filters. Site Server uses the same filters as Indexing Service. Site Server uses filters installed on the operating system, including the MIME filter for news and e-mail, the Office filter for Office documents, and the HTML filter.

  • Ranking. Site Server uses the same ranking as Indexing Service. Site Server uses ranking algorithms based on the vector space model. Information about the specific algorithms is included in the Platform SDK. The default algorithm used is the Jaccard formula.

  • Schema support. Site Server provides rich, broad schema support. Users can define properties over OLE DB data types by using a proprietary management interface.

  • Extensibility and programmability. Site Server has its own object model. For more information, see https://www.microsoft.com/siteserver/site/DeployAdmin/SearchDatabase.htm.

  • Query language. Site Server uses Query Dialect 1 and SQL full-text extensions.

Microsoft Exchange 2000 Server

Overview

Exchange 2000 Server is seamlessly integrated with the Windows 2000 operating system and is designed to meet the messaging and collaboration needs for businesses of all sizes. Together with its client software, Outlook 2000, Exchange provides a highly reliable, scalable, and easy-to-manage messaging and collaboration infrastructure.

For more information, see https://www.microsoft.com/exchange.

Target

If you primarily want to crawl e-mail messages, you should use Exchange 2000 Server. Using Exchange 2000 full-text search, servers can search messaging items in personal mailboxes and public folders to all users.

If you want to aggregate search from e-mail and other sources, use SharePoint Portal Server. However, SharePoint Portal Server does not support crawling private mailboxes.

Search Features

Exchange 2000 Server uses the same search technology that SharePoint Portal Server uses. It uses a version with proven clustering capability.

  • Data access. Data access is restricted to information stored in Exchange public folders and mailboxes.

  • Filters. Exchange full-text search uses the MIME filter to crawl messaging items. Attachments are processed using available filters according to their content type.

  • Ranking. Exchange 2000 Server uses the same advanced probabilistic ranking algorithm that SharePoint Portal Server uses. This algorithm guarantees that the documents most relevant to a query are returned at the top of the list of search results, providing increased user efficiency and satisfaction.

  • Extensibility and programmability. Exchange 2000 Server uses the HTTPDAV protocol, specifically DASL, for search. For more information, see https://msdn.microsoft.com/library/backgrnd/html/webstorewp.htm.

  • Query language. Full-text search in Exchange 2000 uses and supports SQL full-text extensions through the Distributed Authoring and Versioning (DAV) protocol. When using Exchange 2000, Outlook Advanced Search takes advantage of Exchange full-text search. The natural language queries are then submitted directly to the server. There is no client-side support for the SQL query language.

For more information, see the white paper titled "Best Practices for Deploying Full-Text Indexing" on https://www.microsoft.com/technet/prodtechnol/exchange/2000/library/BPDFTI.mspx.

Overview

The world's leading suite of productivity software, Microsoft Office helps you complete common business tasks, including word processing, e-mail, presentations, data management and analysis, and much more.

Target

If you are an Office user and you want to work from your desktop, use Office XP search. Office XP enables you search not only the local hard drive but also file shares and SharePoint Portal Server computers.

Search Features

  • Data access. On Windows 2000 computers, Indexing Service creates an index of local disks, if Indexing Service is enabled. On computers running Microsoft Windows NT® version 4.0, Windows 98, or Windows Millennium Edition, Microsoft Office XP provides a version of the search engine used in SharePoint Portal Server for local disk crawling. The activation of Indexing Service or the Office search indexing engine is left to the user. If indexing is not enabled, Office XP provides a slower, non-indexed form of search.

  • User Interface. Office XP provides a search task pane accessible from Word, Excel, and PowerPoint.

  • Advanced Features. The task pane provides federated search of the user's local hard drives, remote servers through Indexing Services, SharePoint Portal Server computers, SharePoint Team Services sites (which use Indexing Service for their full-text search feature), and Outlook mail (PST files or Exchange mailboxes). A query broker component dispatches search commands to the search providers for each of these stores.

  • Extensibility and programmability. Office applications can program to the search query broker through an API that is similar to the FindFast API. For more information, see https://www.microsoft.com/office/ork/xp/five/wgtd01.htm.

Conclusion

This white paper introduced the basic concept of full-text search and explained how different Microsoft products implement full-text search. This information can help you to determine which Microsoft products are best for your information retrieval needs.

Appendix A provides a technology comparison and a feature comparison of:

  • SharePoint Portal Server

  • Indexing Service

  • Site Server

  • SQL Server 2000

  • Exchange 2000 Server

  • Office XP

Appendix A – Comparison Tables

The tables on the following pages show a technology comparison and a feature comparison of:

  • SharePoint Portal Server

  • Indexing Service

  • Site Server

  • SQL Server 2000

  • Exchange 2000 Server

  • Office XP

Technology Comparison

SharePoint Portal Server

Indexing Service

Site Server

SQL Server 2000

Exchange 2000 Server

Office XP on Windows 2000

Office XP on Windows 98 or Millennium Edition

Full-text search using proprietary query language

 

 

Check

 

Check

 

 

 

Check

Full-text search using SQL full-text extensions

 

Check

 

Check

 

Check

 

Check

 

Check

 

Boolean ranking algorithm

 

Check

 

Check

 

Check

 

 

 

Advanced probabilistic ranking algorithm

 

Check

 

 

 

Check

 

Check

 

Uses multiple data access protocols

 

Check

 

 

Check

 

 

 

Feature Comparison

SharePoint Portal Server

Indexing Service

Site Server

SQL Server 2000

Exchange 2000 Server

Office XP on Windows 2000

Office XP on Windows 98 or Millennium Edition

Crawls:

 

 

 

 

 

 

File system

 

Check

 

Check

 

Check

 

 

Check
Local only

Web sites

 

Check

Check
Local only, through file system

 

Check

 

 

 

Lotus Notes

 

Check

 

 

 

 

 

Exchange 5.5

Check
Public folders

 

 

Check

 

 

 

Exchange 2000

Check
Public folders

 

 

 

Check
Public folders and private mail boxes

 

SQL tables

Check
Through ASP

 

Check
Through ASP

 

Check

 

 

SharePoint Portal Server workspaces

 

Check

 

 

 

 

 

Check

3rd party protocols

 

Check

 

 

 

 

 

Best Bets

 

Check

 

 

 

 

 

Categories

 

Check

 

 

 

 

 

End user UI

Dashboard site

Windows Explorer on Windows 2000 and custom

Custom

Custom

Outlook through Advanced Find, custom

Office search task pane