Configuring SharePoint Server 2010 Search for External Content Types (Part 2 of 2)

Summary:  Microsoft Business Connectivity Services (BCS) enable users to easily include external data as results for the SharePoint Microsoft SharePoint Server 2010 search capabilities. Learn to configure external content types for SharePoint Server search indexing and options for configuring SharePoint Server search results.

Applies to: Business Connectivity Services | Office 2010 | Open XML | SharePoint Designer 2010 | SharePoint Foundation 2010 | SharePoint Online | SharePoint Server 2010 | Visual Studio

Provided by:  Bob McClellan

Contents

  • Introduction

  • Explanation of the BDC Model XML

  • Modifying the External Content Type to Support Incremental Crawls

  • What Happened to IDEnumerator?

  • Using the Same External Content Type for External Lists

  • Additional Resources

  • About the Author

This article is the second in a two-part series of articles about Configuring SharePoint Server search for external content types.

Introduction

Microsoft Business Connectivity Services (BCS) enable you to connect SharePoint solutions to external data sources and to define external content types that are based on that external data. SharePoint 2010 search services can index the external data by using external content types to show meaningful search results. Business Connectivity Services, formerly known as the Business Data Catalog, provide additional improvements that include Microsoft SharePoint Designer 2010 support.

In the previous article of this series, Configuring SharePoint Server 2010 Search for External Content Types (Part 1 of 2), I described how to use SharePoint Designer 2010 to define external content types and how to set up a Microsoft SharePoint Server 2010 search crawl that includes that external data. This approach reduces crawl times for large amounts of data by using incremental crawls. We describe this by using SharePoint Designer, SharePoint Central Administration, and XML code examples.

Before beginning, you should have the following:

  • Access to SharePoint 2010 Central Administration. This includes the search service and the business data connectivity service.

  • Access to SharePoint Designer.

  • SharePoint 2010 web application that has a site collection. For more information, see Create a site collection (SharePoint Server 2010. Note that you can use Sharepoint - 80.

  • Access to Microsoft SQL Server. Note that you can use the same instance of SQL Server that the computer that is running SharePoint Server 2010 uses.

  • Sample database for Microsoft SQL Server (For example, this example uses AdventureWorks. To install this sample database, see AdventureWorks 2008 R2.)

Explanation of the BDC Model XML

You made a modification to the XML version of the BDC model in the last section. This section will explain most of the elements of that XML file. Feel free to skip down to the next section about incremental crawls, if you are not that interested in these details. This information is not necessary to complete the examples. It will help if you are looking at an actual model XML file while you read through this.

The top-level element is the Model. It has a Name attribute that will appear as the name of the model when you import it. LobSystems is a child of Model. It may contain one or more LobSystem child elements. The LobSystem element has a Name attribute that is used to easily determine the external data source. It also has a Type attribute for the kind of data source. For these examples, the Type is Database.

The actual connection information for that data source is defined as Property elements in the LobSystemInstance element, which is a child of LobSystemInstances. LobSystemInstances is a child of LobSystem. Property elements are very common in the BDC model. Many of the elements allow a Properties element that contains one or more Property elements. A Property element has a Name attribute that identifies the property and a Type attribute that identifies the .NET kind of the value for that property. The value then appears in the body of that Property element. Table 1. shows some of the property names for LobSystemInstance.

Table 1. Properties for LobSystemInstance

Name

Type

Example Value

AuthenticationMode

System.String

PassThrough

DatabaseAccessProvider

System.String

SqlServer

RdbConnection Data Source

System.String

localhost

RdbConnection Initial Catalog

System.String

AdventureWorks

RdbConnection Integrated Security

System.String

SSPI

RdbConnection Pooling

System.String

True

ShowInSearchUI

System.String

<anything>

You might have noticed that all the properties that begin with RdbConnection are regular connection string values for an SQL database. Any value that appears in a connection string can defined in this list of properties by prefixing it with RdbConnection. The last property listed, ShowInSearchUI, does not have value. Even though it is defined as a string, having the property is all that is needed for the model to know that the LobSystemInstance must appear in the user interface.

Another child of LobSystem is Entities. Although it may contain one or more Entity children, you will generally see models with one entity. The Entity is equal to a single external content type. There are several attributes for an Entity. These include the Name and the DefaultDisplayName. Table 2 shows the properties for an Entity element.

Table 2. Properties for Entity

Name

Type

Example Value

Title

System.String

Name

DefaultAction

System.String

View Profile

The Title property defines the field that will be used as the title in the search results. The DefaultAction property refers to an Action element by name. That action will be used as the link from the search results. In these examples, that action is always set to open a URL for a profile page.

Some child elements for an Entity are AccessControlList, Identifiers, AssociationGroups, Actions and Methods.

The AccessControlList element may contain one or more AccessControlEntry elements that have a Principal attribute that indicates the user name to give access to. The child elements are Right elements that define the access rights for the principal. Each Right element has a BdcRight attribute that can be Edit, Execute, SetPermissions, or SelectableInClients.

The Identifiers element contains an Identifier element that has a Name attribute for the field name that is an identifier for the entity and a TypeName attribute that contains the .NET data kind of the identifier.

The AssociationGroups element is used to define associations between external content types. The Actions element contains an Action element, usually to open a profile page. These can easily be defined by SharePoint Designer. Therefore, I do not say much about them here.

The Methods element contains one or more Method elements. Each Method element has a Name attribute that uniquely identifies the method for this entity. It also has an AccessControlList element that is identical to the one in the Entity element. Table 3 shows the properties for a Method element.

Table 3. Properties for Model

Name

Type

Example Value

RdbCommandType

System.String

System.Data.CommandType ...

RdbCommandText

System.String

SELECT [ProductID], [ProductModelID] FROM [Production].[Product]

BackEndObjectType

System.String

SqlServerTable

BackEndObject

System.String

Product

Schema

System.String

Production

The RdbCommandText is very important because it enables you to greatly customize the SQL query used for your data. For example, if you cannot create a view on a particular database, you could still configure that text to do joins within the command text as long as the return parameters matched. The BackEndObjectType can be SqlServerView if the RdbCommandText refers to a view instead of table.

FilterDescriptors is a child of the Method element. It may contain one or more FilterDescriptor elements. SharePoint Designer supports creating filters in its user interface. But they are not generally that useful for search methods where you control the SQL query. There is one filter, SynchronizationCookie, which will be shown in the next section.

Parameters is a child of the Method element. It may contain one or more Parameter child elements. The Parameter element has a Name attribute that uniquely identifies the parameter and Type attribute that can be In, InOut or Return. There can only be a single parameter for each Method that is a Return type. The number and kinds of parameters are determined by the operation that the method implements. You can define many of these operations by using SharePoint Designer. Therefore, it is easiest to create the operation there and then edit as necessary. Two of the operations that cannot be defined by SharePoint Designer, ChangedIdEnumerator and DeletedIdEnumerator, will be defined in the next section.

The parameter is actually defined by the child TypeDescriptor element. This element has attributes for the Name, TypeName, and IdentifierName. Some elements have simple types and are defined by a single TypeDescriptor element. Others can be very complex with nested TypeDescriptors and TypeDescriptor elements and additional properties. Table 4 shows some of the properties for the TypeDescriptor element.

Table 4. Properties for TypeDescriptor

Name

Type

Example Value

Size

System.Int32

25

RequiredInForms

System.Boolean

true

Decimal Digits

System.Int32

4

Mostly, you should be able to copy and edit the parameters that are defined by SharePoint Designer.

The last important child of the Method element is MethodInstances. It contains a MethodInstance element that defines the kind of operation. The Type attribute may be Finder, SpecificFinder, AssociationNavigator, IdEnumerator, and others. The ReturnParameterName attribute must refer to the name of the attribute that defines the return value for the operation. The Default attribute will usually be true because you will only define one method for each operation. If there were more than one Finder, for example, by default, this would specify which one to use. The Name attribute identifies the MethodInstance and the DefaultDisplayName will be used when that operation name is displayed. The MethodInstance element has some important properties as shown in Table 5.

Table 5. Properties for MethodInstance

Name

Type

Example Value

RootFinder

System.String

<anything>

AttachmentAccessor

System.String

<anything>

LastModifiedTimeStampField

System.String

ModifiedDate

The RootFinder marks the Finder method that the search crawler will use to enumerate all of the top level rows that should be crawled. There should be only one Finder method that is marked as the RootFinder. The AttachmentAccessor property was discussed in the previous section and the LastModifiedTimeStampField property is discussed in the next section.

For more information about the BDC model, see Additional Resources.

Modifying the External Content Type to Support Incremental Crawls

It's easy enough to run full crawls constantly for the numbers of records in these examples. However, it does not work as well if you want to index a million records. After the external content type and SharePoint Server search is set up for that kind of volume, it is much better to run incremental crawls to find any changes and then update only those in the search index. SharePoint Server search supports two approaches for handling incremental crawls on external content types. There are advantages and disadvantages to each, but first you must understand how they function.

Time stamp-Based Incremental Crawls

In this approach, you define a field that contains the last modified time stamp for each record. If that time stamp is newer than the last crawl, the indexes for that record are updated. For this example, you will be modifying the external content type based on a view, but first you must add the last modified date to the view in the AdventureWorks database.

To configure the Product view and Description view to add the modified date

  1. Start SQL Server Management Studio.

  2. In the Connect to Server dialog box, select Database Engine as the Server type.

  3. Type localhost in the Server name box.

  4. Click Connect.

  5. Expand the Databases item in the Object Explorer group.

  6. Right-click the AdventureWorks item, and then click New Query.

  7. Execute the ALTER VIEW statement as shown in the following example.

    ALTER VIEW [Production].[vProductAndDescription] 
    WITH SCHEMABINDING 
    AS 
    SELECT 
    p.[ProductID] 
    ,p.[Name] 
    ,pm.[Name] AS [ProductModel] 
    ,pmx.[CultureID] 
    ,pd.[Description] 
    ,CASE WHEN p.ModifiedDate > pm.ModifiedDate THEN p.ModifiedDate ELSE pm.ModifiedDate END AS ModifiedDate
    FROM [Production].[Product] p 
    INNER JOIN [Production].[ProductModel] pm 
    ON p.[ProductModelID] = pm.[ProductModelID] 
    INNER JOIN [Production].[ProductModelProductDescriptionCulture] pmx 
    ON pm.[ProductModelID] = pmx.[ProductModelID] 
    INNER JOIN [Production].[ProductDescription] pd 
    ON pmx.[ProductDescriptionID] = pd.[ProductDescriptionID];
    

This adds one new field to the view. For this example, the ModifiedDate is the most recent modified date from Product or ProductModel. If you also wanted to include ProductDescription in that expression, you must add another case to determine which of the dates is most recent.

To add the ModifiedDate to the external content type operations

  1. In SharePoint Designer, click External Content Types in the navigation pane.

  2. In the list, click Products to configure that external content type.

  3. In the External Content Type Operations group, double-click the Read Item operation.

  4. Click Next two times.

  5. In the Return Parameter page, select the ModifiedDate field in the Data Source Elements pane.

  6. Click Finish.

  7. In the External Content Type Operations group, double-click the Read List operation.

  8. Click Next two times.

  9. On the Return Parameter page, select the ModifiedDate field in the Data Source Elements pane.

  10. In the Properties pane, select the box next to Timestamp Field.

    Figure 18. Setting the Timestamp Field

    Setting the Timestamp Field

  11. Click Finish.

  12. Save the external content type.

By selecting the box for the Timestamp field, you have set that field to be used by the search service to determine whether the record has changed since the last crawl. Internally, the XML will now have a LastModifiedTimeStampField property that specifies the ModifiedDate field. If you exported the model for Products and searched for the property, you would find it in the MethodInstances properties for the Read List method.

You can now follow the usual steps to create the AWProductSearch content source and run a full crawl. As before, you may want to delete the AWModelSearch to avoid seeing results from both sources. When the full crawl has finished, start an incremental crawl. When the incremental crawl has finished, you can now see whether it worked.

To examine the results of the incremental crawl

  1. In the search service configuration in SharePoint Designer, click Crawl Log under Crawling in the navigation pane.

  2. Click Crawl History at the top.

You should see that the full crawl had something like 296 successes and the incremental crawl had only 2 successes. Because no records had changed, the incremental crawl did not have to re-index the records.

If you want to test the modified date, make a change to one or more of the records that includes updating the ModifiedDate field to the current date/time, and then run another incremental crawl. The successes are larger than 2 (3 if you change one record), but not as high as 296. The query in the following example shows this kind of change.

UPDATE Production.Product
SET Name='Classical Vest, M',ModifiedDate=GETDATE()
WHERE ProductID=865

Changelog-Based Incremental Crawls

In this approach, SharePoint Server search stores a token or cookie that we designate to track changes since the last crawl. In this case, the cookie will be a single time stamp from the last time that a crawl was started. If there have been any changes since that time, only the modifications are updated in the index. This process requires two new operations to be defined for the external content type, one for changed records and one for deleted records. These must be created manually because SharePoint Designer does not have a way to create them. (If you look at the external content type in SharePoint Designer, they will be shown in the External Content Type Operations list. However, you will be unable to configure them.) As before, you must export the BDC model in order to configure the XML directly.

To export the BDC model to an XML file

  1. In SharePoint Designer, click External Content Types in the navigation pane.

  2. In the list, click AWProducts next to Products to select that external content type.

  3. On the ribbon, click Export Business Data Catalog Model.

  4. In the Export Business Data Catalog Model dialog box, type Products in the Business Data Catalog Model Name box.

  5. Click OK.

  6. Change Save as type to All Files.

  7. Choose a destination folder for the file.

  8. Change the File Name box to Products.xml.

  9. Click Save.

To implement these two operations, you will need two SQL queries. One will return the records changed after a particular time stamp, and the other will return the records deleted after a particular time stamp. For the product table, I will use the DiscontinuedDate field to indicate that a record was effectively deleted and should no longer appear in search results. If the DiscontinuedDate field is null , it is still active. The DiscontinuedDate from the Product table must be added to the view exactly as ModifiedDate was added previously, although you do not have to look at the ProductModel record. Therefore, it is a simple added field (p.DiscontinuedDate).

Here are the new methods that you must add to the BDC model for the Products external content type within the Methods element. They both use a SynchronizationCookie filter type to track the last run date and time. The SQL query must return the appropriate set of changed or deleted record IDs and also update the last run date and time. The key SQL is highlighted in the XML for these two operations in the following example.

<Method IsStatic="false" Name="GetChangedProductIds">
  <Properties>
    <Property Name="RdbCommandType" Type="System.Data.CommandType, System.Data, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089">Text</Property>
    <Property Name="RdbCommandText" Type="System.String">
if @LastRunDate != CAST('1900-1-1' as datetime)BEGINSELECT [ProductID] FROM [Production].[vProductAndDescription] WHERE [ModifiedDate] &gt; @LastRunDateENDSET @LastRunDate=CURRENT_TIMESTAMP
    </Property>
    <Property Name="BackEndObjectType" Type="System.String">SqlServerView</Property>
    <Property Name="BackEndObject" Type="System.String">vProductAndDescription</Property>
    <Property Name="Schema" Type="System.String">Production</Property>
  </Properties>
  <AccessControlList>
    <AccessControlEntry Principal="contoso\administrator">
      <Right BdcRight="Edit" />
      <Right BdcRight="Execute" />
      <Right BdcRight="SetPermissions" />
      <Right BdcRight="SelectableInClients" />
    </AccessControlEntry>
  </AccessControlList>
  <FilterDescriptors>
    <FilterDescriptor Name="LastRunDate" Type="InputOutput">
      <Properties>
        <Property Name="SynchronizationCookie" Type="System.String">ChangedItemCookie</Property>
      </Properties>
    </FilterDescriptor>
    <FilterDescriptor Name="FilterDescriptor" Type="Timestamp" />
  </FilterDescriptors>
  <Parameters>
    <Parameter Name="@LastRunDate" Direction="InOut">
      <TypeDescriptor Name="LastRunDateTypeDescriptor" TypeName="System.DateTime" AssociatedFilter="LastRunDate">
        <Interpretation>
          <NormalizeDateTime LobDateTimeMode="Local" />
        </Interpretation>
      </TypeDescriptor>
    </Parameter>
    <Parameter Direction="Return" Name="ChangedProductIds">
      <TypeDescriptor TypeName="System.Data.IDataReader, System.Data, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" IsCollection="true" Name="ChangedProductIds">
        <TypeDescriptors>
          <TypeDescriptor TypeName="System.Data.IDataRecord, System.Data, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" Name="ChangedProductIdElement">
            <TypeDescriptors>
              <TypeDescriptor TypeName="System.Int32" ReadOnly="true" IdentifierName="ProductID" Name="ProductID">
                <Properties>
                  <Property Name="RequiredInForms" Type="System.Boolean">true</Property>
                </Properties>
              </TypeDescriptor>
            </TypeDescriptors>
          </TypeDescriptor>
        </TypeDescriptors>
      </TypeDescriptor>
    </Parameter>
  </Parameters>
  <MethodInstances>
    <MethodInstance Name="GetChangedPrimaryKeysInstance" Type="ChangedIdEnumerator" ReturnParameterName="ChangedProductIds"  />
  </MethodInstances>
</Method>
<Method IsStatic="false" Name="GetDeletedProductIds">
  <Properties>
    <Property Name="RdbCommandType" Type="System.Data.CommandType, System.Data, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089">Text</Property>
    <Property Name="RdbCommandText" Type="System.String">
SELECT [ProductID] FROM [Production].[vProductAndDescription]
WHERE [ModifiedDate] is null
    </Property>
    <Property Name="BackEndObjectType" Type="System.String">SqlServerView</Property>
    <Property Name="BackEndObject" Type="System.String">vProductAndDescription</Property>
    <Property Name="Schema" Type="System.String">Production</Property>
  </Properties>
  <AccessControlList>
    <AccessControlEntry Principal="contoso\administrator">
      <Right BdcRight="Edit" />
      <Right BdcRight="Execute" />
      <Right BdcRight="SetPermissions" />
      <Right BdcRight="SelectableInClients" />
    </AccessControlEntry>
  </AccessControlList>
  <FilterDescriptors>
    <FilterDescriptor Name="LastRunDate" Type="InputOutput">
      <Properties>
        <Property Name="SynchronizationCookie" Type="System.String">DeletedItemCookie</Property>
      </Properties>
    </FilterDescriptor>
    <FilterDescriptor Name="FilterDescriptor" Type="Timestamp" />
  </FilterDescriptors>
  <Parameters>
    <Parameter Name="@LastRunDate" Direction="InOut">
      <TypeDescriptor Name="LastRunDateTypeDescriptor" TypeName="System.DateTime" AssociatedFilter="LastRunDate">
        <Interpretation>
          <NormalizeDateTime LobDateTimeMode="Local" />
        </Interpretation>
      </TypeDescriptor>
    </Parameter>
    <Parameter Direction="Return" Name="DeletedProductIds">
      <TypeDescriptor TypeName="System.Data.IDataReader, System.Data, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" IsCollection="true" Name="DeletedProductIds">
        <TypeDescriptors>
          <TypeDescriptor TypeName="System.Data.IDataRecord, System.Data, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" Name="DeletedProductIdElement">
            <TypeDescriptors>
              <TypeDescriptor TypeName="System.Int32" ReadOnly="true" IdentifierName="ProductID" Name="ProductID">
                <Properties>
                  <Property Name="RequiredInForms" Type="System.Boolean">true</Property>
                </Properties>
              </TypeDescriptor>
            </TypeDescriptors>
          </TypeDescriptor>
        </TypeDescriptors>
      </TypeDescriptor>
    </Parameter>
  </Parameters>
  <MethodInstances>
    <MethodInstance Name="GetDeletedPrimaryKeysInstance" Type="DeletedIdEnumerator" ReturnParameterName="DeletedProductIds"  />
  </MethodInstances>
</Method>

Although these methods appear very long, they are actually fairly simple and similar. The filter is defined to pass in the last run date, although it is really a time stamp that contains both date and time. The return parameter is defined as a set of records that contains a ProductID. The MethodInstance elements define these methods as the changed record and deleted record operations.

The SQL code is also similar. Both check whether the LastRunDate time stamp is valid. If so, they return all the modifications since that date and time. Then the time stamp is updated to the current date and time so that the cookie will be stored with the new value. The changed record method compares the ModificationDate to the last time stamp and the deleted record method compares the DiscontinuedDate.

There is one other change that should be made to correctly implement this approach. The SQL for the Read List operation should be updated to ensure that no deleted records will be included in a full crawl, as shown in the following example.

<Property Name="RdbCommandText" Type="System.String">
SELECT [ProductID],[Name],[ProductModel],[CultureID],[Description]
FROM [Production].[vProductAndDescription]
WHERE [DiscontinuedDate] IS NULL
</Property>

Make these changes to the Products model, and you might also want to remove the LastModifiedTimeStampField property to verify that the new methods are doing the job. Import the modified model using the BDC service configuration in SharePoint Central Administration.

To import an XML file to update a BDC model

  1. In the BDC service configuration in SharePoint Central Administration, click the Edit tab.

  2. In the View section of the ribbon, select External Content Types from the drop-down list.

  3. Click the drop-down menu for ProductDetail, click Delete, and then click OK.

  4. Click OK.

  5. In the Business Data Catalog Models section of the ribbon, select Import.

  6. In the Business Data Catalog Model File box, type your XML file name or click Browse to locate it.

  7. Click th Import. You may see one or two warnings, but there should be no errors.

  8. Click OK.

Now you can try it out by recreating the content source for SharePoint Server search and then making changes to the records, starting the incremental crawl and searching to see the results.

If the database must delete records, instead of marking them as deleted, you must use another approach to keep track of deletions. The easiest is to create a new table for deleted IDs and add a trigger to the original table that adds records to the new table when they are deleted. The table of deleted records could store IDs and a time stamp. If your incremental crawls are run on a schedule, you could also schedule a SQL job to delete older records from the table.

Comparing the Time Stamp-Based Approach and the Change Log-Based Approach

The time stamp-based approach is much simpler to implement. However, the change log-based approach is much more efficient. The time stamp-based approach requires that the search service to store the timestamps for every record and then compare them all every time that it does an incremental crawl. The change log-based approach requires the search service to store only a single time stamp. In addition, when the search service requests the changes, it processes only the records that have changed. If you have many records, the change log-based approach saves time and storage. If you have very few records, the incremental crawls are not really necessary. If you are not sure which approach to use, you can try the time stamp-based approach first. Then, if it seems to be too slow or be taking too much disk storage, try switching to the change log-based approach. However, if you have no control over the database, you may be forced to use the approach that works with the existing schema.

What Happened to IDEnumerator?

If you have used SharePoint Server search with external data since the Business Data Catalog was introduced in SharePoint 2007, you would have used IDEnumerator instead of the Read List operation. The IDEnumerator method still works for searches. However, it is not supported directly in SharePoint Designer. One of the uses of IDEnumerator was to support very large external lists that otherwise would be throttled by the site settings. The SharePoint Server search crawlers are not constrained by the throttle settings. Therefore, you do not have to use segmenting techniques. If you already have an external content type with IDEnumerator defined, you can use it for SharePoint Server search. For new external content type definitions, it is quicker to use the Read List operation in SharePoint Designer.

Using the Same External Content Type for External Lists

The use of external content types for external lists is not really within the scope of this article. However, you might be tempted to use the same external content types for both. Just remember that many of the customizations for SharePoint Server search could also affect the appearance of the external list. Also, you could easily have to deal with throttling limits that could interfere with the efficiency of the SharePoint Server search crawls. In general, it is probably better to use different external content types for these purposes unless your data is very simple and less than a thousand records.

Additional Resources

For more information, see the following resources:

About the Author

Bob McClellan has been developing applications, mostly for database systems, for almost thirty years. He recently worked on the Power Tools for Open XML, which are a series of Windows PowerShell commands for manipulating Open XML documents. Bob has extensive programming experience in a variety of computer languages, database systems and development environments.