Caching in SharePoint
Caching is a common technique for improving the performance and scalability of a system. There are four resources that impact scalability:
- Processing capacity
- Memory capacity
- Disk I/O
- Network capacity
In-memory caching decreases the amount of available memory in order to reduce the demands on processing capacity, disk I/O, and network capacity. Caches can be located on a disk or in memory. Disk-based caches can include information that is expensive to retrieve, such as binary large objects (BLOB) from a database, or information that is expensive to compute, such as the final output for a page.
Generally, caching improves performance at the cost of data staleness. Data becomes stale when the value for the information stored in the cache differs from the value of the data in the backing store. As the time interval increases between when the data was last retrieved and the current time, the chances increase that the cached data is stale.
The other characteristic that affects the likelihood of staleness is the volatility of the backing store. Volatility is a measure of how frequently the backing data changes. For example, you may have data such as city names that have low volatility because cities do not change their names very often. This data can be cached for a long time with little chance of the data becoming stale.
You may have data that changes at fixed times. For example, a catalog system may batch together and apply all editing changes to the production data every night. In these cases, you know when the data becomes stale, and you can flush the cache at that time.
In cases where data is highly volatile and must be accurate, it may not be a good candidate for caching. For example, in a high-volume system, inventory changes very rapidly. It is not a good candidate for caching if you require an accurate count. In many cases, caching indicators that are derived from the data's state may be enough. For an inventory, you may only need to know that the inventory count is adequate, that it is critically low, or that there is no inventory available. You can cache the derived inventory indicator value because it is much less likely to become stale than the actual inventory count.
Another technique that can prevent staleness is to invalidate the data in the cache when items in the backing store change. This approach detects changes in the backing store and sends an event to each instance of the cache to invalidate the cached value. Implementing this type of notification system in a farm can be complex and expensive. If the data source is a database such as SQL Server, cache invalidation capabilities may be standard features.
SharePoint-Specific Caching Mechanisms
SharePoint maintains several caches that are described in detail in Caching in Office SharePoint Server 2007 on TechNet. The following sections give an overview of the caches that are implemented or extended by SharePoint.
The output cache is based on the ASP.NET output caching feature. Office SharePoint Server 2007 extends the capabilities of the ASP.NET output cache with cache profiles. A cache profile is a group of settings that gives greater control over managing the caching output than what is available in ASP.NET. For example, it allows you to customize how a page is rendered for users with different permission sets. A cache profile can be applied across a set of pages.
A cache profile is similar to but more powerful than the "vary by" capability that is native to ASP.NET. The Partner Portal application's promotion pages can use cache profiles to customize cached pages for specific security levels. The Promotions site collection can be configured to use the Extranet (Published Site) cache profile, which has Vary by User Rights turned on. This prevents users with different permissions, such as users that belong to different partner groups, from accessing the same cached publishing pages.
To select the Extranet cache profile for the Promotions site collection
- Navigate to http://localhost:9001/sites/promotions.
- Click Site Actions.
- Click Site Settings.
- Click Manage All Site Settings.
- In the Site Collection Administration column, click Site collection output cache.
- In the Default Page Output Cache Profile section, click Extranet (Published Site) in the Authenticated Cache Profile drop-down list.
In addition to using cache profiles, you can implement a VaryByCustom event handler that is supported by the ASP.NET output cache.
Note that entries in the output cache are periodically refreshed. This means that data latency can be an issue.
For more information about SharePoint output caching, see Custom Caching Overview on MSDN.
The object cache stores list query results and SharePoint objects. SharePoint objects such as sites and Webs are expensive to create. Obtaining these objects through the object cache can greatly improve performance if you repeatedly access the same objects. The object cache has the same data latency limitations as the output cache with regard to cached query results.
The Partner Portal application accesses the object cache through the PortalSiteMapProvider class to aggregate information from within a site collection. For more information, see Techniques for Aggregating List and Site Information. The application also uses the Content Query Web Part. This uses the object cache for cross-list query caching. For more information, see Caching in Office SharePoint Server 2007 on TechNet.
The BLOB cache is a disk-based cache that caches binary large objects such as sound and video files. Caching these objects will reduce the load on the network and the database. However, the cache increases disk I/O on the Web front-end servers because the objects are stored and retrieved from the local disk. The cache only works with items that are in document libraries. Although the BLOB cache plays an important role in improving performance, it has limited use for developers. There are no examples in the Partner Portal application of the BLOB cache.
In addition to the SharePoint caches, developers can use the ASP.NET cache directly. This is the most common technique for caching application data. For an extensive discussion of caching, see Improving .NET Application Performance and Scalability on MSDN. This section provides an overview of the ASP.NET cache. The following sections describe some considerations for using the ASP.NET cache.
Cached application data is located in the memory of the Web front-end server. One issue to consider is if the information is consistent across all the Web front-end servers in the server farm. You must also consider if that information is consistent with the back-end data sources. A specific Web front-end server generally retrieves and caches the data at different times than the other Web front-end servers in the farm. If a data source, such as an item in a SharePoint list, changes, and one of the servers refreshes the item in its cache, the representation on this server is different from that on the other servers with the older data.
One way to solve this problem is with a synchronized cache (also called a coherent cache). NCache is an example of a product that provides a synchronized cache. Microsoft currently has a beta product named Velocity that also offers a synchronized cache capability. Another strategy is to use cached data for browsing information, but to go directly to the data source for transactional operations. For example, users who browse a catalog see cached pricing information, but when they purchase a product, the pricing information is retrieved directly from the pricing source to calculate the cost. The Partner Portal application does not include any examples of resolving cache/data consistency issues.
Staleness is related to consistency and impacts the accuracy of the information. For this reason, it is not recommended to cache data that frequently changes. The longer information is cached, the greater the likelihood of inaccuracies. The Partner Portal application only caches information for short intervals.
Another technique to avoid staleness is to invalidate the cache when the back-end data changes. This must be done by each server in the farm that is caching the data. The ASP.NET cache supports invalidation, but you often need to implement additional functionality to propagate the invalidation across all servers in a farm. Invalidation is simpler with a distributed cache because the cache only needs to be invalidated once, and the distributed cache synchronizes the change across the farm. The Partner Portal application does not include examples of cache invalidation. For more information about cache invalidation with ASP.NET, see CacheDependency Class on MSDN.
Caching is intended for data that is frequently accessed. To perform efficiently, caches often have expiration policies that remove information that has not been accessed within a specific time frame. The Partner Portal application uses this technique for cached product information that is from the product catalog. For more information, see Product Catalog.
Another strategy is to limit the overall size of the cache. After the cache exceeds this limit, it purges the least recently used information (this is referred to as an LRU cache).
The ASP.NET cache provides performance counters that indicate the cache's efficiency. One important value to monitor is the cache hit ratio, which is the ratio of hits to misses for the cache. This value is a good indicator of how often cached information is being reused.
Another consideration for caching is security. By caching data, you increase the likelihood that security will be compromised. The ASP.NET cache does not support permissions. If a process is compromised and an attacker runs code within that process, the attacker can access anything in the cache. If the data were not in the cache, the attacker would need to compromise additional boundaries to see it. Additionally, programming errors can allow users to see information that is not intended for them. After you understand the potential security risks and decide that they are acceptable, you can take precautions. For example, you can cache information that is specific to a set of users. The Partner Portal application applies this technique in the pricing repository to prevent one partner from seeing the pricing information that is intended for another partner.
Techniques for High Volume Sites
In high-volume sites, there is a race condition that occurs when a site is populated with an item. For detailed information about this issue, see Best Practices: Common Coding Issues When Using the SharePoint Object Model on MSDN. This section provides an overview. Although the ASP.NET cache is thread safe and does not need to be protected by a synchronization lock, there is another potential performance problem that synchronization can help solve. For example, assume that you cache information that is expensive to retrieve on a site that serves three requests per second for a page that uses the cached information. In addition, assume that it takes two seconds to retrieve the information from a back-end service. Finally, consider what happens when the cache expires the information. The following figure illustrates this.
Caching without a synchronization lock
Because the information is retrieved every time the item is not in the cache, there are five unnecessary calls to the back-end server. This has two detrimental effects:
- It puts an additional load on SharePoint and on the service that responds to the requests.
- It causes the response time to be slower than necessary for five of the six users.
A synchronization lock around the logic that retrieves the information to cache eliminates this condition. This is shown in the following figure.
Caching with a synchronization lock
For code examples that demonstrate this technique, see Best Practices: Common Coding Issues When Using the SharePoint Object Model on MSDN. This approach is not implemented in the Partner Portal application. It was not implemented in this case in to reduce the complexity of the application and because the application does not include a scenario where high levels of traffic are generated.
Caching and the BDC
The Business Data Catalog (BDC) manages metadata for Web services and database services. It also provides a standard set of Web Parts that can be bound to that metadata without requiring any code. For more information, see Consuming Web Services with the Business Data Catalog (BDC). However, the default implementation of the BDC service and the related Web Parts do not provide a caching mechanism for storing the retrieved information. You can still take advantage of the BDC and cached information if you develop custom Web Parts to render the data. For an example, see Repositories in the Partner Portal. The entity information is retrieved from the BDC and is stored directly into the ASP.NET cache.