Creating Content Sources to Crawl Business Data in SharePoint Server 2007 Enterprise Search

Summary: Learn how to create content sources to crawl business data in Microsoft Office SharePoint Server 2007 Enterprise Search.

Office Visual How To

Applies to: 2007 Microsoft Office System, Microsoft Office SharePoint Server 2007

Patrick Tisseghem, U2U

April 2007


Think of a content source as a location containing resources that you want to crawl or index. In Microsoft Office SharePoint Server 2007, many types of locations are accessible by default: SharePoint sites, Web sites, network folders, Microsoft Exchange Server public folders, and data exposed by using the Business Data Catalog. In this how-to we'll focus on data exposed by using the Business Data Catalog, and discuss the steps to take when creating and configuring a content source of type Business Data. We'll also review a sample of how to accomplish the steps programmatically, using some of the classes exposed in the new search administration API.

Code It

IDEnumerator in the Application Definition File

Indexing entity instances requires an additional method at the level of the entity in the application definition file. This method must be of type IDEnumerator and must return primary keys for the instances.

The application definition file is imported in the Business Data Catalog metadata repository using the administration site of the Shared Services Provider. A content source of type Business Data can then be created that points to the available business data application.

Required Search Administration and Business Data Catalog References

Developers can create applications that programmatically perform all of the steps administrators take in the browser. Notice that the sample code must be executed on the computer running Office SharePoint Server, and requires references to Microsoft.SharePoint.dll, Microsoft.Office.Server.dll, Microsoft.Office.Server.Search.dll, Microsoft.SharePoint.Portal.dll, and System.Web.dll. The following code example shows the namespaces that are used.

Connecting to the Shared Services Provider and the Search Context

Before you can work with the classes to manipulate the content sources, you must obtain the reference to the context of the Shared Services Provider and the Search Service. The following code example shows this, and assumes that the name of the Shared Services Provider is SharedServices1.

Listing Existing Content Sources

You can retrieve the list of content sources by creating an instance of the Content class with the reference of the SearchContext as an argument in the constructor. Next, loop over all of the content sources for display. Every content source has one or more start addresses.

Retrieving Crawling Status

Several properties that are exposed at the level of the ContentSource class store information about the crawling status and timing.

Starting a Crawl

Two crawl methods are available: full crawl and incremental crawl. Both are exposed as methods of the ContentSource class.

Retrieving the Business Data Applications

All of the functionality of the Business Data Catalog is exposed by a rich object model (the Microsoft.Office.Server.ApplicationRegistry namespace in Microsoft.SharePoint.Portal.dll). If the code is not running in the context of SharePoint Server, an instance of the SqlSessionProvider and a call to the SetSharedResourceProviderToUser method internally hooks up the application with the Shared Services Provider context. The ApplicationRegistry class exposes a GetLobSystemInstances method with all of the business data applications available in the Business Data Catalog.

Creating a Content Source

To create a content source, you call the Create method at the level of the ContentSourceCollection object. You can create a Business Data Catalog-specific URI to the business data application by calling the static method ConstructStartAddress exposed by the type BusinessDataContentSource. This URI is added to the StartAddresCollection object of the ContentSource instance. A call to the Update method saves everything in the database.

Read It

The crawler that is provided in SharePoint Server can be directed to a location and ordered to index content available in that location by creating and configuring a content source at the level of the Shared Services Provider. Following are the different types of content sources that can be created:

  • SharePoint sites

  • Web sites

  • Network folders

  • Exchange Server public folders

  • Business data

  • Lotus Notes databases (only after a post-installation step)

Business data stored in a structured way in a relational database such as Microsoft SQL Server, or stored in line-of-business (LOB) systems such as SAP or Microsoft Dynamics CRM can be exposed in a unified and consistent way via the Business Data Catalog middle-layer. Developers model the business data in a declarative way in an XML file: the application definition file. A full discussion of all of the elements in this file is out of scope here, but one method is very important and must be included in the application definition file for the crawler to index the data. This method must be of type IDEnumerator and return all of the primary keys of the records to index. The crawler generates a profile page for each of them and indexes the content on that page.

The search administration object model exposes different classes you can use to programmatically create, configure, and manage content sources. Figure 1 shows all of the classes used within the sample code.

Figure 1. The search-related classes discussed in this how-to

The search-related classes discussed
See It Video for MOSS 2007 Creating Content Sources

Watch the Video

Length: 14:27 | Size: 14.9 MB | Type: WMV file

Explore It