Find Anything with Windows Desktop Search
At a Glance:
- Types of files indexed by WDS
- How the WDS index is maintained
- Deployment tips
- Management and Group Policy
Have you ever wondered why it takes 3.2 seconds to search the Internet but 2.7 days to search your hard drive? Don’t you wish you could search your hard drive and e-mail as quickly as the Internet? You can with the newly improved Windows Desktop Search (WDS)
Think about how much information you have stored on your computer, and where that information resides. Now think about all the information your users have stored on their laptops, desktops, or on the network. I have all my rarely used information on my d:\ drive or a network share, my somewhat important information in My Documents, and my really vital information on the desktop and in the tons of e-mail that has built up over the years. (Not to mention the millions of other Outlook® items stored in contacts, calendar, and tasks.)
Finding anything in that morass is quite a chore. And not just for me. IDC conducted a survey that showed the average active computer user spends nine and a half hours a week looking for documents, which illustrates the serious need for a better solution. So how can WDS help?
Figure 1** Broad Range of Search Results **
Windows® Desktop Search is an extremely powerful search engine that lets you search through all of those pockets of information: your hard drives, your network drives (via UNC names), and even your e-mail. WDS can index over 200 unique file types (over 350 if you count different software versions). For a full list see "List of searchable file types". And, with an extensible API set, WDS can index anything, anywhere. WDS is available to all users in their taskbar and it runs automatically whenever the user logs into the machine. Figure 1 shows some typical search results. Built-in right-click capabilities allow you to copy, move, delete, print, open, reply, forward, and even preview your search results (see Figure 2).
Figure 2** Previewing Files **
Windows Desktop Search also provides great flexibility to multinational organizations. The optional Multilanguage User Interface (MUI) pack download offers support for 30 languages in addition to English (see Figure 3).
Figure 3 Supported Languages
|● Bulgarian||● Korean|
|● Chinese (Simplified)||● Latvian|
|● Chinese (Traditional)||● Lithuanian|
|● Croatian||● Norwegian|
|● Czech||● Polish|
|● Danish||● Portuguese (Brazilian)|
|● Dutch||● Portuguese (Iberian)|
|● English||● Romanian|
|● Estonian||● Russian|
|● Finnish||● Slovakian|
|● French||● Slovenian|
|● German||● Spanish|
|● Greek||● Swedish|
|● Hungarian||● Thai|
|● Italian||● Turkish|
If you tried the initial versions of WDS, you may have been deterred by the performance hit you took during indexing. The new version of WDS has significant improvements, though. WDS 2.6.5 reduces disk access by 50 percent compared with previous versions, and it eliminates the delay during system shutdown. WDS 2.6.5 offers three new policies (including policies for inclusions/exclusions of specific paths), considerable improvements in stability and results, and full integration in the Windows servicing model. There are also add-ins for Lotus Notes and for the MSG file format so WDS can index Outlook MSG files that have been saved outside of Outlook. Content within e-mail attachments are indexed and searched as well.
If you have multiple or dual-core processors, look for enhanced performance and even less noticeable system impact as the thread-optimized search engine code takes advantage of the additional processing threads.
To take advantage of WDS you must be running Windows XP SP1 or higher, Windows Server™ 2003, or Windows 2000 SP4. As of now only the 32-bit versions are supported; support for 64-bit systems is expected in the near future. To index and search your e-mail, you need Outlook 2000 or Outlook Express 6.0 or greater. For full preview of Office documents you need Office XP or later.
From a hardware perspective, you need at least a Pentium 500MHz processor or better (1GHz recommended), 128MB of RAM (256MB recommended, 512MB is optimal), and 500MB of hard disk space is recommended for the index. The size of the index depends on how much content you have and how much you index. And for best viewing, a 1024 x 768 screen resolution is recommended.
WDS is easy to deploy into any environment. It comes as a self-extracting executable file of only 4.5MB, so you can use Systems Management Server (SMS) to deploy it or you can turn to Group Policy. The size of the program makes it ideal for Group Policy package deployment. However, in order to use Group Policy for deployment, note that you will first need to wrap the WDS file in an .msi file.
WDS is distributed like other Windows components through the package installer (formerly update.exe), which means it can also be distributed very easily using a script. WDS supports both attended and unattended installation modes. Attended mode, which requires end user interaction, is more common.
For unattended installations you can create custom batch files using the command-line options that are shown in Figure 4, or by using SMS, Group Policy, or Windows Update Services.
Figure 4 Installation Command-Line Options
|/quiet or /q||Provides no status dialog box during the extraction. Can be used in combination with /extract or /extract path, or this option will direct the installation to run in quiet mode.|
|/passive or /U||Provides a progress bar during the extraction, but does not prompt you for the destination folder name. Can be used in combination with /extract or /extract:path, or it will direct the installation to run in passive mode.|
|/extract or /X||Extracts package files without starting the installation. Prompts you for the path of the destination folder for extraction. When used in conjunction with the /q or /U switch, extracts the package file to a randomly named folder on the root folder.|
|/extract:path_name or /X:path_name||Extracts software update package files to the specified folder without starting the package installer or prompting you for a destination folder. When used in conjunction with the /q or /U switch, extracts the package file to the specified folder.|
The command line offers many options for automated installation. When you use the /q option, for example, the WDS installation is completely silent, and if a restart is required, it occurs automatically. Alternatively you can use the /passive or /U option, which displays a progress bar and warns that a restart will occur if necessary. Passive mode also displays any errors that may be encountered. Figure 4 details the command-line options and their functions.
After extraction, the files are placed in the specified folder. If no folder is specified and the /extract option was used with /passive or /quiet, a randomly named folder (like 1ed6b742f546f) is generated and the setup files are placed there. When installation is finished, the folder and installation files are removed.
Figure 5 provides examples of commands you can use to extract the contents of a software update package.
Figure 5 Extraction Commands
|wds-x86-xx.exe /q /x:C:\WDS||Extracts and installs the contents of the package to a newly created folder called WDS on drive C.|
|wds-x86-xx.exe /extract /passive||Extracts the contents of the package to a randomly generated folder in the root folder of the current drive.|
Once WDS is deployed, it will create and maintain an individual index of all the information on a user’s PC as well as any calendar appointments, contacts, tasks, and e-mails in Outlook. Each user maintains his own index and finds items only within his own scope on the system. The index also contains metadata for an item including the time a file was created, location, owner, file type, and so on. By default, WDS will index just the contents of the My Documents folder and the default e-mail location (stored either locally in a .pst or .ost file or on the Exchange Server). Indexing can be expanded to look into other drives and directories on your system including network drives via UNC paths.
WDS consists of four elements: WindowsSearch.exe, WindowsSafeFilter.exe, WindowsSearchIndexer.exe and WindowsSearchFilter.exe. WindowsSearch.exe is the primary component. It runs automatically at computer startup and is located in the Startup Program Folder. It manages the indexer process and provides the management interface via the desktop tray. Additionally, it defaults to replacing the standard Windows search GUI (Search Companion) with the enhanced search interface.
The WindowsSafeFilter.exe processes the information before it gets into the index and maintains the integrity of the index. WindowsSafeFilter.exe is the guardian of the WDS index, preventing attacks against it. It is the component responsible for a number of security tasks.
WindowsSearchFilter.exe is critical for the extensibility of WDS. It provides the host for IFilter add-ins, which enable the WDS indexer to open, read, and index the contents of new file types it would not otherwise be able to fully index. Many software programs on your PC already have IFilters installed. For example, if you use Microsoft Office Visio®, you automatically have the Visio IFilter plug-ins installed. Several free IFilters are available (StarOffice/OpenOffice, PDF, ZIP, and so on) and are located at Search Even More File Types and Content Sources. What’s great about IFilters is the extensibility they provide. WDS provides the architecture that lets you or your developers write customized IFilters. Windows Desktop Search is smart enough to recognize new IFilter plug-ins and will be able to index the contents of the associated file types the next time the index is rebuilt. Moreover, WindowsSearchFilter.exe was specifically designed to survive poorly written third-party IFilters, in order to maintain overall reliability.
The final component, WindowsSearchIndexer.exe, is the chief process for creating and querying the WDS index. This process runs per user, which ensures it indexes the documents in the user’s security context. In future releases the indexer is expected to become a system service where it will maintain the high level of security WDS currently offers. WindowsSearchIndexer.exe stores the index by default at: %UserProfile%\Local Settings\Application Data\Microsoft\Desktop Search.
The index is stored in the \Local Settings\Application Data\Microsoft\Desktop Search directory structure of the user’s profile (Figure 6), which looks rather complex. Two main components make up the WDS index: an inverted index and a properties cache database.
Figure 6** Desktop Search Directory **
The inverted index is actually an index of words in your documents and it includes up to 120 additional properties, though the typical document (any file that’s indexed) will only have about 10 properties. The inverted index portion of WDS is made up of several files located in the desktop search directory. The .ci and .dir files are located in %UserProfile%\Local Settings\Application Data\Microsoft\Desktop Search\Applications\RSApp\Projects\MyIndex\Build\Indexer\CiFiles. The main index file will be the largest file in the directory; the other files are referred to as shadow indexes.
Keeping the Index in Sync
One of the challenges of maintaining a searchable index is making sure it produces accurate results even when documents change (moved, deleted, altered, and so forth). In this structure the database must be updated in order to reflect any changes, and this is where the multiple shadow indexes come into play.
The properties cache database (RSApp.edb) is crucial in processing the search results. RSApp.edb holds all the extra properties concerning the documents—typically path, owner, type and the like—and will work with the inverted index to return a full resultset. RSApp.edb provides the necessary properties to open the document, preview it, group the documents, sort the results, and display them.
Here is a simplified example that illustrates how the two databases work together to produce a resultset. Let’s say you’re looking for the word "phaser." If there are multiple words or wildcards in the search query, the inverted index will cross-reference the index to return accurate results. The inverted index might return a result something like Figure 7. Then the WDS engine would reference the properties database to display results like those in Figure 8.
Figure 8 Properties Cache
Figure 7 Inverted Index
|Word||Document||Locations in Document|
The shadow indexes along with the main inverted index keep the most current index. For example, if the word phaser gets changed in a document, the shadow index will keep the resultset up to date (see Figure 9). Over time the shadow indexes are merged into the main inverted index to reflect all changes.
Figure 9 Shadow Index
|Word||Document||Locations in Document|
What happens under the covers when the initial indexing process begins and WDS needs to create the first index? During deployment, the process will determine when the documents were last modified and when (if ever) WDS last crawled them. If the process determines an index doesn’t exist, then the indexer begins to crawl the document directory structures (directory tree, e-mail folder tree, and so on).
After deployment, the behavior is somewhat different. When the documents on the system change, a file system notification will trigger and send the changed document straight into the process. These updates are prioritized and sent to the beginning of the indexing process, so if the WDS is doing other work, these changes are sent to the front of the line to be processed.
The way WDS indexes the documents is similar in both of these processes. The document to be indexed is sent to a protocol handler which turns it into a stream of data. The stream is then sent to the appropriate IFilter for parsing. When the document has been parsed, the appropriate information is then distributed to the inverted index or properties cache.
What all this updating means is that you should never have to rebuild the index. There is a specific policy you can implement to prevent your users from rebuilding the index on their own.
In addition to updating the indexes as you make changes on your PC, WDS preserves performance during busy periods using a smart back-off mechanism that forces WDS to wait until CPU load drops before resuming indexing. This back-off functionality is triggered automatically when WDS sees increased usage in any of the following: I/O, keyboard or mouse, or CPU. In other words, WDS works efficiently without consuming excess system resources while other higher priority tasks are running. For laptops, WDS can be configured to prevent indexing while the computer is running on battery power.
WDS is designed with security in mind. It adheres to the Microsoft security and privacy model, so WDS does not index sensitive locations such as Web pages, temporary Internet files, the cache, and password protected Office files. It is also designed to index e-mail attachments in a sandbox and prevent the previewing of complex attachment formats (such as a ZIP or macro file) by default. This is accomplished with the WindowsSafeFilter.exe mentioned earlier. You can prevent the indexing of attachments altogether using a Group Policy setting called Prevent Indexing of E-Mail Attachments. WDS stores the index in a simplified encryption format. You can achieve even more security by placing the index on an NTFS volume and using the Encrypting File System (EFS).
WDS can be managed via Group Policy, so it’s easily customized to match your organization’s business needs. There are over two dozen different settings that provide the flexibility to control your WDS installation. The three main areas you can manage with Group Policy are setup, index, and search. These settings not only allow you to manage WDS, but also enhance its functionality. In the search section, Add Primary Intranet Search Location and Add Secondary Intranet Search Location allow your users to push their search queries to your intranet search providers, such as SharePoint® Portal Server or SharePoint Services.
If you use either the /quiet or /passive switches for unattended install mode, you should enable the "prevent first run customization wizard" setting, which keeps users from seeing the first run wizard. A majority of the settings are located in the index section in GPO. These settings control what will be in the index and where WDS will look for the documents (see Figure 10).
Figure 10** Group Policy Settings **
WDS also offers a completely extensible API set that provides virtually limitless capability to customize the search experience to meet your needs. In addition to adding your own IFilters, you can extend WDS so it can index new data stores, such as the database of an e-mail application. And WDS doesn’t stop at your local environment. With one click, it can take your users’ search to the Internet directly from the interface, further integrating the silos of information your users and your organization need to access effectively.
The WDS interface can also be made more visible in your users’ environment. If you load the MSN® Search Toolbar suite in combination with WDS, the search interface will show up in Microsoft Internet Explorer® and Outlook, which makes it even easier to use.
Why not give Windows Desktop Search a try so you can finally unlock all those silos of information in your organization and truly integrate your data. To get started, grab a copy.
WDS Online Resources
For lots more on Windows Desktop Search, see the following sources:
- Matt’s Blog
- Searching Tips
- Windows Desktop Search Administration Guide
- Desktop Search Add-ins
- MSN Toolbar
- Windows Desktop Search SDK
- Desktop IFilters on Channel 9
Matt Hester is a seasoned TechNet presenter, an Exchange Server insider who worked as an MCT for over eight years before joining Microsoft. Matt loves reaching out to users and customers in the local community and gets a thrill from installing a server that can send e-mail or provide other services.
© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited.