Managing eDiscovery in SharePoint Server 2010 (ECM)

Article
07/24/2014

Applies to: SharePoint Server 2010

In this article
eDiscovery Everywhere
eDiscovery Programming Model
Hold Reports
Search in eDiscovery
Separating Held Content from Non-Held Content

eDiscovery (electronic discovery) is the process by which organizations find, retain, and preserve documents related to a particular court case. It is also sometimes referred to as the holds feature. In Microsoft SharePoint Server 2010, eDiscovery can be activated as a site feature on any site. Use the eDiscovery feature to add items to hold for electronic discovery across the entire site collection. You can also use the feature to manage holds and search for content across a repository that spans multiple site collections. You can specify a search query and schedule all of the items that you want to add to a hold, or you can add each item to a hold individually.

eDiscovery Everywhere

Some court cases require the ability to search in various locations, and the Records Center is not a long-term archive. As a result, you may decide to use the Records Center to store all important documents, but use a custom site template for a collaborative space; you may want to use eDiscovery in the collaborative space. Managing records by using multiple solutions in multiple locations creates a need for an eDiscovery solution that can search for records everywhere in the enterprise: eDiscovery everywhere. eDiscovery is enabled anywhere by the internal hold infrastructure feature.

The internal hold infrastructure feature is activated by default in every SharePoint Server 2010 site. When activated, all features on the site become ready for items in it to be placed on hold. The following items can take specific actions related to a hold:

The search and process job can add an item to a hold.
The release hold job can remove an item from a hold.
You can see the hold data related to items and add or remove items from a hold.
Because the hold infrastructure feature is enabled by default everywhere, you can search an entire site collection and place the results on hold without activating the Hold feature in the subweb.
A Hold feature enables you to define and manage holds in the Web where the feature is enabled.

eDiscovery Programming Model

The APIs used for most eDiscovery development reside in the Microsoft.Office.RecordsManagement.Holds namespace.

Hold Reports

Hold reports are created and managed by classes and members in the Microsoft.Office.RecordsManagement.Reporting namespace. Hold Report objects list new data and data that has changed. Because each hold can be associated with multiple reports, there is one report per site collection in the multisite collection repository. If there are many items on hold, it is necessary to split the report into multiple pieces.

The hold report in the destination location includes the results of the search that was run on the source location. When a search and send to another site operation is processed by the timer job, it calls the Web service and passes in data about the query it performed, such as the sites searched, the query that was run, and the date that the query was executed. The public-facing holds feature works at the site level, which enables users to create a list of multiple hierarchical holds in a site collection (SPSiteCollection object). After the list is created, the feature pulls data from all holds lists in the site’s parent chain.

A hold Report object is generated even if there is no content on the selected hold. If a report contains no content, SharePoint Server 2010 throws a ReportEmptyException exception.

Search in eDiscovery

Searches based on list item properties, such as dates, are typical in eDiscovery scenarios. For example, you may want to find all contracts that were signed before a specific date. Additionally, in SharePoint Server 2010, content distribution scenarios are supported where the content is less centralized and more spread out—making it more important to be able to specify precise queries and scope for eDiscovery searches. Search for eDiscovery was built to accommodate scope, scale, extensibility, query reuse, and the ability to search across multiple site collections. You can:

Scope an eDiscovery search to specific sites.
Scale targets: put a 10,000 item search result on hold.
Use search engines other than SharePoint Server Search, such as FAST, to perform eDiscovery searches.
Access a list of search queries that are performed for a specific hold.
More easily read queries. Queries are human-readable and easy to reproduce.

Search can span multiple site collections. This includes capability to define one list of holds for all of the site collections encompassed by the search, to add search results from across multiple site collections to a hold, and to scope a search to one site collection in the multisite collection repository.

Separating Held Content from Non-Held Content

The eDiscovery and holds features include the ability to separate held content from content that is not held. You can copy search results to a Records Center, where they are routed and placed with other records. You can also configure the feature to copy search results to the Records Center and separate them from existing records.

To separate held content from content that is not held, you can set up a separate Records Center dedicated to eDiscovery, or you can create a custom router that examines incoming hold properties and then routes them to a special library based on the property value.

Creating a separate Records Center dedicated to eDiscovery successfully separates held content from corporate records, but the content is not routed to a special location based on the hold it is placed under. This approach has the further disadvantage of requiring a separate Records Center for every type of property that you want to keep separate.

Multisite Collection Repositories

As a general infrastructure feature, SharePoint Server 2010 includes a way to group site collections together into a subscription. Subscriptions meet most, but not all, of the needs of site collections to be grouped together for purposes of eDiscovery. Therefore, we created a subtype of subscription called a multisite collection repository that shares common functionality with subscriptions but also provides the unique functionality that eDiscovery requires.

The multisite collection repository includes an API for enumerating the site collection, a site for configuring settings that span the entire multisite collection repository, a way to pass group IDs to a service shared across multiple groups that can be partitioned, and a way to uniquely distinguish a multisite collection repository from multiple site collections that are grouped together for other reasons (for example, a hosted environment).

The Subscription Settings page includes a link to Discovery and Hold Settings, which is a page used to configure eDiscovery and hold settings for the current subscription. If the document and Records Centers span multiple site collections to support eDiscovery, then choose to group all site collections into a single repository. Banding together all of the site collections into a single repository enables one centralized hold list, one report library, and a search page for the entire subscription. You can also specify the following:

Whether to search the entire subscription
The site collection where the master holds list resides
The scope in the registered search service used to preview search

After you designate the group of sites as a multisite collection repository and enable Holds in the hub, you are finished configuring the multisite collection repository.

Extending eDiscovery to Span a Multisite Collection Repository

You can extend eDiscovery to support search and hold operations that span multiple sites. You can specify that you want to search all available site collections, query that scope, and place a hold on all search results from the multiple site collection scope.

To span multiple site collections, SharePoint Server 2010 creates an asynchronous work item on each site collection in the multisite collection repository, and then passes the search query and hold to those work items. Each site collection processes the hold request separately and sends a separate e-mail message containing results for each site collection in the scope.

The reporting architecture is likewise distributed: each site collection generates a Report object about the items that are on hold within that site collection. These reports are each stored in the collection of hold reports in the central hub.

To enable eDiscovery searches across the multisite collection repository, every site recognizes whether it is part of a multisite collection. Additionally, each search and add to a hold page in the multisite collection repository recognizes the list of all site collections within it and the list of all holds within it, and can restrict results previews to those within the multisite collection repository.

Searching Across a Multisite Collection Repository

As previously noted, SharePoint Server 2010 locks down the Search and Add to Hold page so that only site collection administrators can access it. When the scope is expanded to multiple site collections, this level of access control is not enough: an administrator of one site collection cannot view all of the content in another site collection. If a tenant administrator makes an explicit choice to enable searches across a multisite collection repository, then privilege elevation is acceptable. However, SharePoint Server 2010 warns the person who enables eDiscovery in the multisite collection repository that any site collection administrator on the site with the master holds list can access all items in the subscription.