Microsoft SharePoint Portal Server 2001 Resource Kit

Archived content. No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.
On This Page

Microsoft SharePoint Portal Server
Microsoft Indexing Service
Microsoft SQL Server 2000
Microsoft Site Server
Microsoft Exchange 2000 Server
Microsoft Office XP Search
Full-Text Search Comparison Tables
Summary

This chapter reviews the concept of full-text search and explains how different Microsoft® products implement full-text search. This information can help you to determine which Microsoft products are best for your information retrieval needs.

Microsoft full-text search technology contributes to a number of server and client products. Search functionality varies, depending on the requirements of each product. However, all products benefit from the common advantage of efficient retrieval of unstructured, textual data by means of a full-text index.

The following Microsoft products use variants of Microsoft full-text search technology:

  • Index Server, Indexing Service for Microsoft Windows® 

  • Microsoft SharePoint™ Portal Server 2001 

  • Microsoft SQL Server™ 7 and SQL Server 2000 

  • Microsoft Exchange Server 2000 

  • Microsoft Site Server 3 

  • Microsoft Office XP 

The product you choose depends on your needs. For example, you might want to search intranet sites, Internet sites, or Exchange public folders, or you might want to search over-structured or unstructured data. You might need to cater to an internal team, or you might need to serve the needs of customers over your extranet site. These and other considerations help you determine which product is best for you.

For more information about full-text search technology or these products, see Appendix B, "For More Information."

Full-Text Search 

Full-text search provides relevant information from a collection of sources in response to a user's need. This need is typically expressed as a textual query that looks for each, or any, of the query terms in each of the documents in the collection. A simple approach opens and scans each document when a query is processed, looking for each of the query terms. However, opening every document at query processing time and searching for the query terms can be very time consuming. This approach is impractical beyond the individual user searching a small number of documents.

The simple solution is to do much of the work ahead of time. This is done by extracting information about the terms in each document and storing the information in a way that is easy to retrieve. When the search engine processes a query, there is no need to scan each document. The search engine only needs to compare the documents to each other by using the inverted index. The search engine then chooses the documents that are most relevant to the query.

The principle of doing much of the work ahead of query time serves as the foundation of all full-text search technologies, including Microsoft full-text search. To be effective, a search technology must

  • Get documents from various document stores. 

  • Extract text from various document formats. 

  • Update the index with the document terms. 

  • Rank the documents, bringing the most relevant documents to the top of a list. 

Good search technology performs these tasks for documents in various languages, over many different types of formats, and across documents stored in a variety of document repositories. Good search technology returns those documents that are truly relevant to a user's need. At its best, full-text search technology fits into a complete knowledge solution, where direct textual query is the user's last resort. Full-text search technology should interpret the information the user needs by using advanced mechanisms, and it should answer the query with a combination of structured and unstructured information.

The following components of Microsoft full-text search technology provide an excellent full-text search solution:

  • Protocol handlers. A protocol handler accesses data over a particular protocol or from a particular store. Common protocol handlers include the file protocol, Hypertext Transfer Protocol (HTTP), Messaging Application Programming Interface (MAPI), and HTTP Distributed Authoring and Versioning (HTTPDAV). The protocol handler processes URLs passed to it by the Gatherer. 

  • Gatherer. The Gatherer maintains the queue of URLs to access across protocols. For example, a Web site crawl may include hundreds of pages and create network traffic by accessing each Web page one at a time. To increase efficiency, the Gatherer interleaves URLs from a remote Web location with URLs from other Web locations or with access to file system documents or other stores. The Gatherer may use additional logic to improve crawl efficiency, such as SharePoint Portal Server adaptive crawling. The Gatherer balances the load that the gathering process imposes on crawled servers. The Gatherer maintains the queue of URLs to be processed and manages the combined crawl. For each document accessed, the Gatherer fetches the stream of content from the protocol handler and passes it on to the appropriate filter. 

  • Filters. Filters (also known as IFilters) extract textual information from a specific document format, such as Microsoft Word documents or text files. For example, Microsoft provides the Microsoft Office filter, which can extract terms from Word, Microsoft Excel, and Microsoft PowerPoint® files. Other filters work with HTML or e-mail messages. There are also third-party filters, such as the PDF filter provided by Adobe. 

    The filter extracts a stream of textual information from a document, discarding all non-textual and formatting information. The filter produces strings of text and property/value pairs to pass in turn to the index engine. All filters are written to an application programming interface (API). For more information about filters, see Appendix B. 

  • Word breakers and stemmers. A word breaker is a component that determines where the word boundaries are in the stream of characters in the query or in the document being crawled. A stemmer extracts the root form of a given word. For example, "running," "ran," and "runner" are variants of the word "run." In some languages, a stemmer expands the root form of a word to include alternate forms. 

    SharePoint Portal Server provides word breakers for English, French, Spanish, Japanese, Thai, Korean, Traditional Chinese, and Simplified Chinese. SharePoint Portal Server uses the Windows 2000 Server Indexing Service word breakers for Dutch, Italian, Swedish, and German. When SharePoint Portal Server crawls documents that are in multiple languages, the customized word breaker for each language enables the resulting terms to be more accurate for that language. When there is a word breaker for the language family, but not for the specific sub-language, the major language is used. For example, SharePoint Portal Server uses the French word breaker to handle text that is French Canadian. If no word breaker is available for a particular language, SharePoint Portal Server uses the neutral word breaker. Words are broken at neutral characters, such as spaces and punctuation marks. The code for determining where words are broken is built into the Microsoft Search (MSSearch) service and cannot be changed. You cannot create custom word breakers. 

  • Index engine. The function of the index engine is to prepare an inverse index of content. An inverse index is a data structure with a row for each term. In this row, there is information about the documents in which the term appears and the number of occurrences and relative position of the term within each document. The inverse index provides the ability to apply statistic and probabilistic formulas to compute the relevance of documents quickly. 

    Applications that do not have full-text search enabled, such as Windows or Microsoft Outlook®, access each document at query time. These applications traverse each document and use a filter or other outdated technology to find query terms. This process is very slow when compared to an inverse index. The inverse index provides the ability to go directly into a ranking formula instead of going to sources. 

  • Ranking. Ultimately, the task of evaluating a query results in a set of relevant documents. In relational databases, each row either is in the result set or is not. For example, when a user queries for "all accounts with a balance lower than or equal to $30,000", it is easy to tell which rows in the accounts table to return. The task of full-text search, by contrast, is subtler. The queries are imperfect representations of an information need, and the documents retrieved vary in their relevance. Full-text search ranks the most relevant documents at the top of the result set. Less relevant documents are still valuable to the user, however. Full-text search ranks these documents further below. 

Microsoft full-text search products differ in the algorithm used for this ranking. Index Server and Site Server 3 use vector-based ranking algorithms, while later products employ an advanced probabilistic algorithm.

Query Languages 

To express the information request to the system, the user depends on a language that describes the restrictions and conditions over the terms. For example, a user may be interested in all documents published in the previous week. To query for this, the user must express both the concept of "publishing" a document and the precise time range. For example, the time range might start on the previous Monday and end on the previous Sunday.

Microsoft full-text search products evolved through three different query languages:

  • Query Dialect 1 

  • Query Dialect 2 

  • Structured Query Language (SQL) full-text extensions 

The following sections discuss Microsoft products that incorporate Microsoft full-text search technology. Each section includes an overview of the product, its target user, and the way in which full-text search integrates with the product. For more information about these products and the related technologies, see Appendix B.

Microsoft SharePoint Portal Server

Cc722856.spacer(en-us,TechNet.10).gif Cc722856.spacer(en-us,TechNet.10).gif

SharePoint Portal Server is the flexible portal solution with which you can find, share, and publish information easily. With SharePoint Portal Server, you can use existing information effectively and capture information in new ways that are appropriate for your business. In addition, you can rapidly deploy a prepackaged dashboard site and easily use Web Parts technology to customize a Web-based view of your organization.

SharePoint Portal Server targets dashboard site solutions, starting with the team portal and ending at the enterprise portal.

SharePoint Portal Server presents the most current and the richest set of search and information discovery features. Figure 5.1 illustrates components of the SharePoint Portal Server Search architecture.

Cc722856.f05xx01(en-us,TechNet.10).gif

Figure 5.1 SharePoint Portal Server content crawling and search architecture 

The following list describes the components of the SharePoint Portal Server Search architecture.

  • Search Engine. Component of MSSearch that runs queries written in the SQL full-text extension syntax against the full-text index. 

  • Index Engine. Component of MSSearch that processes chunks of text and properties filtered from content sources, and determines which properties are written to the full-text index. 

  • Gatherer. Component of MSSearch that manages the content crawling process and that has rules that determine what content is crawled. 

  • Word breakers. Components shared by the Search and Index engines that break up compound words and phrases. 

  • Stemmers. Components shared by the Search and Index engines that generate inflected forms of a word. 

  • Filter Daemon. Component that handles requests from the Gatherer. Uses protocol handlers to access content sources, and IFilters to filter files. Provides the Gatherer with a stream of data containing filtered chunks and properties. 

  • Protocol Handlers. Open content sources in their native protocol and expose documents and other items to be filtered. 

  • IFilters. Open documents and other content source items in their native format and filter into chunks of text and properties. 

  • Content sources. Collection of data MSSearch must crawl, and specific rules for crawling items in that content source. Items in content sources are identified by URLs. The protocol portion of the URL is what distinguishes different types of content sources. 

  • Data Access. SharePoint Portal Server uses protocol handlers and the Gatherer to crawl and provide search results over data from diverse content sources. Without modification, SharePoint Portal Server can crawl documents from file systems, Web sites, Exchange 2000 Server and Exchange Server 5.5 computers, Lotus Notes servers, and other SharePoint Portal Server workspaces. 

Although it does not provide direct access to OLE DB, Open Database Connectivity (ODBC), or other relational data access standards, SharePoint Portal Server can crawl information from databases by using HTTP. To do this, you must create an Active Server Pages (ASP) page that renders information from each row in the database.

The Microsoft SharePoint Portal Server SDK describes the protocol handler interface. The protocol handler interface enables developers to write a protocol handler for document repositories with other, proprietary, data access methods, such as document management systems or archiving solutions. For more information about this interface, see Appendix B. The Microsoft SharePoint Portal Server Resource Kit CD-ROM includes protocol handlers that you can use to crawl File Transfer Protocol (FTP) sites and SharePoint Team Services sites. You can access these protocol handlers in the \Tools directory of the CD. For a complete listing of tools and Web Parts available on the CD, see Appendix A, "Tools, Samples, eBooks, and More."

  • Filters. SharePoint Portal Server includes filters for Microsoft Office documents, HTML files, Tagged Image File Format (TIFF) files, and text files. The TIFF filter enables SharePoint Portal Server to crawl the textual content of saved fax data based on Optical Character Recognition (OCR) technology. When filtering messages from Exchange public folders, SharePoint Portal Server uses the Multipurpose Internet Mail Extensions (MIME) filter that is included with Windows 2000. SharePoint Portal Server also supports third-party and custom file types, such as the Adobe PDF filter. For more information about the Adobe PDF filter, see Appendix B. 

  • Ranking. SharePoint Portal Server offers an advanced probabilistic ranking algorithm, which is based on achievements in information retrieval accomplished by Microsoft Research. This algorithm guarantees that SharePoint Portal Server returns the documents that are most relevant to a user's query at the top of the list of search results, providing increased user efficiency and satisfaction. 

    Stephen Robertson, Microsoft researcher, City University professor, and winner of the prestigious Association for Computing Machinery Special Interest Group on Information Retrieval (ACM SIGIR) 2000 Salton Award, developed the formula for ranking. The ranking formula adopted and used by Microsoft full-text search is a direct result of this research. In computing the likely relevance of a document, the formula uses the following factors: the length of the document, the frequency of the query term in the entire collection of documents the number of documents containing the query term, and the number of documents in the entire collection of documents. 

  • Best Bets. This feature enables users with appropriate permissions to tag individual documents as most appropriate for specific queries or categories. Even in the most advanced probabilistic ranking environment, certain documents lack the textual information to be prominent in search results for particular terms. The Best Bets feature addresses this problem most effectively, either by advancing the specially tagged documents to the top of the results list or by displaying them prominently when browsing categories. The default query included with SharePoint Portal Server also nominates Best Bet documents when the rank of the document is very high. For more information about the default query, see Chapter 24, "Analyzing the Default Query for the Dashboard Site." 

  • Automatic Categorization. In addition to simple search, SharePoint Portal Server provides automatic categorization. This feature enables the user to define a category hierarchy and then use a sample set of documents within the hierarchy as a training sample. After training, SharePoint Portal Server automatically tags documents stored on the server and crawled documents. After they are tagged, these documents appear in the category hierarchy. 

  • Schema Support. SharePoint Portal server provides simplified schema management facilities that are compatible with Office by using promotion and demotion. Users define document profiles and associated properties. During promotion, SharePoint Portal Server copies property values in the Office document to the properties of a document profile. During demotion, SharePoint Portal Server copies property values found in a document profile to the Office document. SharePoint Portal Server tightly integrates full-text search with that schema. Advanced search uses properties and document profiles. 

  • Extensibility and Programmability. The SharePoint Portal Server dashboard site uses Microsoft Digital Dashboard technology. Microsoft Digital Dashboard technology enables easy integration of business applications and custom content with the built-in search features of SharePoint Portal Server. SharePoint Portal Server provides query submission and search results as Web Parts, which can easily coexist on the dashboard site with custom Web Parts. However, the Web Parts for query submission and search results rely on each other for functionality and therefore must reside on a SharePoint Portal Server computer. You can manipulate search by using Microsoft ActiveX® Data Objects (ADO), OLE DB, or the Web-based Distributed Authoring and Versioning (WebDAV) protocol. SharePoint Portal Server does not provide automation interfaces for management of its search, document management, or dashboard site features. For more information about developing customized search solutions for SharePoint Portal Server, see Appendix B. 

  • Query Languages. SharePoint Portal Server uses SQL full-text extensions. Queries are submitted using Distributed Authoring and Versioning Searching and Locating (DASL) requests, part of WebDAV, also called HTTPDAV. 

  • Subscriptions. The SharePoint Portal Server subscriptions feature enables users to subscribe to changes in documents, folders, categories, and search results. SharePoint Portal Server maintains subscriptions as persistent queries. SharePoint Portal Server sends notifications to the subscriber whenever a change occurs. SharePoint Portal Server implements subscriptions by using Persistent Query Service (PQS) rules. PQS is a reverse-query processor. It evaluates a large set of queries against a single document to determine which queries match the document. This allows SharePoint Portal Server to identify matching subscriptions as each new document arrives in the document store. Subscriptions provide this "push" model to match the "pull" model of full-text search. 

  • Adaptive Crawling. Site Server 3 introduced incremental crawling, which uses time stamp comparisons to include only documents that have changed since the previous update of the index. Incremental updates reduce the amount of time involved in repeated crawls. However, incremental updates do not eliminate the need to inspect the time stamp of each document previously crawled each time a crawl occurs. Adaptive crawling reduces the time required for crawling even further. During crawls, the algorithm for adaptive crawling compiles statistics about the rate of change for each document. In subsequent adaptive crawls, the algorithm targets only documents likely to have changed. 

SharePoint Portal Server does not replace all of the functionality of Site Server, but the search technology used in SharePoint Portal Server is more recent than the search technology used in Site Server. In addition, SharePoint Portal Server uses an advanced ranking algorithm. You can use the advanced features of the algorithm to conduct a search from the dashboard site. These advanced features include Best Bets, categories, and Office schema integration.

When creating indexes, SharePoint Portal Server offers significantly better performance than Site Server 3 by providing a multi-threaded index engine. The introduction of adaptive crawling also reduces the amount of time it takes to perform incremental crawling when updating indexes.

Microsoft Indexing Service

Cc722856.spacer(en-us,TechNet.10).gif Cc722856.spacer(en-us,TechNet.10).gif

Indexing Service is a Microsoft Windows 2000 base service for file systems and Web servers. Formerly known as Index Server, its original function was to crawl and create a catalog—similar to the index created by SharePoint Portal Server—of the content of Internet Information Services (IIS) Web servers. Indexing Service now creates catalogs for the contents and properties of both file systems and virtual Web sites.

As an operating system component, Indexing Service targets the same wide range of customer scenarios that Windows targets. Indexing Service targets the desktop experience. It provides an enhanced search experience for individual users covering information stored on local disks. You access Indexing Service in Windows when you click the Search button in the Start menu, when you press CTRL+F, when you click the Search button in Windows Explorer, and when you click the search task pane in Office XP. Indexing Service exposes management and query objects that allow rapid development of custom search applications. You can expand Indexing Service catalogs to contain information from remote file shares. Such custom applications can serve vertical applications or groups of users. These custom applications can crawl information from multiple locations.

Indexing Service also offers full-text search from Internet sites. You can use Indexing Service to drive custom search Web applications. In addition to query language support, Indexing Service offers a full range of programmability features targeted for the custom-application developer: scripting objects for query and administration, an OLE DB provider, and ADO compatibility.

The following list describes the components of Indexing Service.

  • Data access. Indexing Service does not include a cross-protocol gathering component. It can access any data that is available from the file system, including local file systems and shared file systems on remote computers. Indexing Service facilitates crawling of Web site content to create an index by using the IIS metabase to understand which files map to Web site content. Indexing Service then follows the information from the IIS metabase to crawl the local Web sites. Indexing Service does not use the HTTP protocol to crawl Web sites. Therefore, Indexing Service cannot crawl content that is rendered dynamically, such as ASP pages referencing a database or personalized content that changes for each user. 

  • Filters. Indexing Service uses filters installed on the operating system, including the MIME filter for news and e-mail, the Office filter for Office documents, and the HTML filter. 

  • Ranking. Indexing Service uses ranking algorithms based on the vector space model. The default algorithm used is the Jaccard formula. For more information about the specific algorithms, see Appendix B. 

  • Schema support. Indexing Service provides rich, broad schema support. By using SharePoint Portal Server Administration in Microsoft Management Console (MMC), users can view all properties indexed from documents and can indicate which properties to store in the property cache for fast retrieval. 

  • Extensibility and programmability. Indexing Service provides a platform for full-text search applications. It includes a full set of programming interfaces, including scripting interfaces for administration and query and an OLE DB provider for searches. For information about Indexing Service programming interfaces, see Appendix B. 

  • Query languages. Indexing Service provides rapid access to files through flexible querying language. Indexing Service supports Query Dialect 1, Query Dialect 2, and SQL full-text extensions. 

For a list of features new to Indexing Service 3 included with Windows 2000, see Appendix B.

Indexing Service is the performance solution for the need in custom application development to provide full-text search over content of an Internet site. It is less appropriate for applications where the data is primarily structured. Developers of such applications should consider Microsoft SQL Server 2000. For ease of use without need for customization, or for applications that require aggregation of content from various sources and source types, SharePoint Portal Server is the appropriate choice.

Indexing Service is an optional operating system component. The initial creation of indexes of file system contents can be resource-intensive and can affect desktop application performance. Therefore, Windows does not enable Indexing Service by default.

Microsoft SQL Server 2000

Cc722856.spacer(en-us,TechNet.10).gif Cc722856.spacer(en-us,TechNet.10).gif

SQL Server 2000 is a family of products that meets the data storage and analysis requirements of the largest data processing systems and commercial Web sites. SQL Server 2000 can provide easy-to-use data storage and analysis services to an individual or a small business.

Full-text search in SQL 2000 focuses on searching data that is primarily structured, but also includes textual, unstructured information.

SQL Server 2000 uses the same search engine technology used by SharePoint Portal Server, benefits from the same advanced ranking algorithm, and uses a subset of the full-text extensions to SQL used by SharePoint Portal Server. The following list describes the components of SQL Server.

  • Data access. You can use full-text search in SQL server only over content stored in SQL columns. 

  • Filters. SQL Server 2000 uses filters installed on the server to handle documents stored in database columns. Users use IMAGE-type columns to store documents, and then specify a second column to indicate the document type. Full-text search then applies the appropriate filter, such as HTML, Office, or third-party filters, based on the document type. In addition, you can apply full-text search to the contents of columns of type [N]CHAR, [N]VARCHAR, and [N]TEXT. 

  • Extensibility and programmability. Full-text search SQL extensions are integrated into the T-SQL language. Users can specify SQL queries that span structured data from SQL tables, unstructured data from SQL columns, from documents embedded in the columns, and from the file system. 

SQL Server 7 first introduced full-text search as a feature of SQL Server.

Microsoft Site Server

Cc722856.spacer(en-us,TechNet.10).gif Cc722856.spacer(en-us,TechNet.10).gif

Site Server is designed to help you get the most usability from your corporate intranet. Site Server enables users to publish, find, and share information quickly and easily. Features include extensive search capabilities and tools to perform thorough analyses of your intranet's usage and effectiveness.

Site Server Commerce Edition is a comprehensive, Internet commerce server that enables you to engage customers, transact business, and analyze commerce Web sites. Highly scalable and secure, Site Server Commerce Edition streamlines and integrates your online dealings with distributors and suppliers.

Since the introduction of Site Server 3 Standard Edition and Site Server 3 Commerce Edition in May 1998, the Web marketplace has evolved rapidly. Site Server 3 Standard Edition targets the intranet space, allowing users to find, share, and publish information to their corporate intranets. In comparison, Site Server 3 Commerce Edition targets the Internet space, with the ability to conduct a financial transaction online, analyze transactions, and conduct a personalized interaction with the consumer.

Since then, the needs of the intranet market have changed substantially and have evolved into the portal market, with greater need for core services and application integration as well as a continued requirement for robust enterprise-wide search. As a result, product focuses have shifted accordingly. The search technology of Site Server 3 Standard Edition is expanded in SharePoint Portal Server.

The following list describes the components of Site Server.

  • Data access. Site Server introduced the concept of gathering and the concept of protocol handlers. Site Server can crawl Exchange Server 5.5 computers and Web sites. The Gatherer can process both hierarchical (file system) and Web spaces (HTTP). Site Server does not support custom protocol handlers. The interface is not extensible to support new document stores. Site Server can crawl information from databases by using an ASP page that renders the information from rows in a database. 

  • Filters. Site Server uses the same filters as Indexing Service. Site Server uses filters installed on the operating system, including the MIME filter for news and e-mail, the Office filter for Office documents, and the HTML filter. 

  • Ranking. Site Server uses the same ranking as Indexing Service. Site Server uses ranking algorithms based on the vector space model. The default algorithm used is the Jaccard formula. For more information about the specific algorithms, see Appendix B. 

  • Schema support. Site Server provides rich, broad schema support. Users can define properties over OLE DB data types by using a proprietary management interface. 

  • Extensibility and programmability. Site Server has its own object model. 

  • Query language. Site Server uses Query Dialect 1 and SQL full-text extensions. 

Microsoft Exchange 2000 Server

Cc722856.spacer(en-us,TechNet.10).gif Cc722856.spacer(en-us,TechNet.10).gif

Exchange 2000 Server integrates seamlessly with the Windows 2000 operating system. It is designed to meet the messaging and collaboration needs for businesses of all sizes. Together with its client software, Outlook 2000, Exchange provides a highly reliable, scalable, and easy-to-manage messaging and collaboration infrastructure.

If your primary need is to crawl e-mail messages, use Exchange 2000 Server. By using Exchange 2000 Server full-text search, servers can search messaging items in personal mailboxes and public folders to all users.

To aggregate searches from e-mail and other sources, use SharePoint Portal Server. However, SharePoint Portal Server does not support crawling private mailboxes.

Exchange 2000 Server uses the same search technology that SharePoint Portal Server uses. Exchange 2000 Server uses a version with proven clustering capability. The following list describes the components of Exchange 2000 Server.

  • Data access. Data access is restricted to information stored in Exchange public folders and mailboxes. 

  • Filters. Exchange 2000 Server full-text search uses the MIME filter to crawl messaging items. Attachments are processed by using available filters according to their content type. 

  • Ranking. Exchange 2000 Server uses the same advanced probabilistic ranking algorithm that SharePoint Portal Server uses. This algorithm guarantees that Exchange 2000 Server returns the documents most relevant to a query at the top of the list of search results, providing increased user efficiency and satisfaction. 

  • Extensibility and programmability. Exchange 2000 Server uses the HTTPDAV protocol, specifically DASL, for searching. 

  • Query language. Full-text search in Exchange 2000 Server uses and supports SQL full-text extensions through the Distributed Authoring and Versioning (DAV) protocol. When using Exchange 2000 Server, Outlook Advanced Search takes advantage of Exchange 2000 Server full-text search. Exchange 2000 Server submits the natural language queries directly to the server. There is no client-side support for the SQL query language. 

Cc722856.spacer(en-us,TechNet.10).gif Cc722856.spacer(en-us,TechNet.10).gif

The world's leading suite of productivity software, Microsoft Office helps you complete common business tasks, including word processing, e-mail, presentations, data management and analysis, and much more.

If you are an Office user and you want to work from your desktop, use Office XP search. Office XP enables you to search not only the local hard disk but also file shares and SharePoint Portal Server computers. The following list describes the components of Office XP.

  • Data access. If you enable Indexing Service on a computer running Windows 2000, Indexing Service creates an index of local disks. On computers running Microsoft Windows NT® version 4, Windows 98, or Windows Millennium Edition, Microsoft Office XP provides a version of the search engine used in SharePoint Portal Server for local disk crawling. You must choose to activate Indexing Service or the Office search index engine. If you do not enable indexing, Office XP provides a slower, non-indexed form of search. 

  • User Interface. Office XP provides a search task pane accessible from Word, Excel, and PowerPoint. 

  • Advanced Features. The task pane provides federated search of the user's local hard drives, remote servers through Indexing Services, SharePoint Portal Server computers, SharePoint Team Services sites (which use Indexing Service for their full-text search feature), and Outlook mail (PST files or Exchange mailboxes). A query broker component dispatches search commands to the search providers for each of these stores. 

  • Extensibility and programmability. Office applications can program to the search query broker through an API that is similar to the FindFast API. 

Full-Text Search Comparison Tables

Cc722856.spacer(en-us,TechNet.10).gif Cc722856.spacer(en-us,TechNet.10).gif

The tables on the following pages show a technology and feature comparison of the Microsoft products that implement full-text searching, including the following:

  • SharePoint Portal Server 

  • Indexing Service 

  • Site Server 

  • SQL Server 2000 

  • Exchange 2000 Server 

  • Office XP 

Technology Comparison 

 

SharePoint Portal Server

Indexing Service

Site Server

SQL Server 2000

Exchange 2000 Server

Office XP on Windows 2000

Office XP on Windows 98 or Millennium Edition

Full-text search using proprietary query language

 

v

v

 

 

v

v

Full-text search using SQL full-text extensions

v

v

v

v

v

 

 

Boolean ranking algorithm

v

v

v

 

 

 

 

Advanced probabilistic ranking algorithm

v

 

 

v

v

 

v

Uses multiple data access protocols

v

 

v

 

 

 

 

Crawls

 

 

 

 

 

 

 

File system

v

v

v

 

 

v
Local only

v
Local only

Web sites

v

v
Local only, through file system

v

 

 

 

 

Lotus Notes

v

 

 

 

 

 

 

Exchange 5.5

v
Public folders

 

v

 

 

 

 

Exchange 2000

v
Public folders

 

 

 

v
Public folders and private mail boxes

 

 

SQL tables

v
Through ASP

 

v
Through ASP

v

 

 

 

SharePoint Portal Server workspaces

v

 

 

 

 

v

v

3rd party protocols

v

 

 

 

 

 

 

Best Bets

v

 

 

 

 

 

 

Categories

v

 

 

 

 

 

 

End user UI

Dashboard

Windows Explorer on Windows 2000 and custom

Custom

Custom

Outlook through Advanced Find, custom

Office search task pane

Office search task pane

Summary

Cc722856.spacer(en-us,TechNet.10).gif Cc722856.spacer(en-us,TechNet.10).gif

This chapter describes full-text search technology that is used in a variety of Microsoft products. This chapter can help you to choose the Microsoft products that are best suited for your information retrieval needs.

Cc722856.spacer(en-us,TechNet.10).gif