Page Output Caching, Part 1
March 20, 2002
About a year ago, I wrote a column that provided an overview of the Caching functionality found in ASP.NET. In this month's column, we'll explode that topic and focus on one aspect of ASP.NET Caching—page output caching.
What Is Page Output Caching?
Dictionary.com defines cache as "...memory holding recently accessed data, designed to speed up subsequent access to the same data."
Applied to software solutions, caching can be viewed as dedicated memory used to hold resources, such as class instances or application data, which are frequently accessed. Rather than recreating the resource each time, the resource can be created one time and used multiple times.
Caching is very advantageous in any application, but when applied to Web applications it can dramatically affect performance in a positive way. Caching can help reduce latency —the time it takes from when a request is made to when the response is received—as well as decreasing the server resources used to process the request and generate the response.
The most common caching type found in Web applications today is using the standard HTTP cache headers to specify how the generated document is to be used, if the client can hold a local copy, when that local copy should expire, and so on. These HTTP cache semantics allow for a wide variety of scenarios. There are three types of users for these HTTP cached resources:
- Origin-Server: The server where the document is originally generated.
- Clients: For example, Internet Explorer and all other browsers support a client cache. If a document can be cached by the client, a local copy can be displayed, rather than fetching the document from the origin-server.
- Downstream clients: These are network applications capable of holding on to cached copies of resources, sitting between the origin-server and the client. A great example here is Microsoft Internet Security and Acceleration Server (ISA) used in reverse proxy-mode. When used in reverse-proxy mode, ISA caches documents from a server. Reverse proxies are extremely advantageous because they allow distributed cached versions of the application to live across the network, which in turn reduces latency in request because the content is closer to the requestor.
ASP.NET will generate the necessary HTTP cache headers for all ASP.NET pages when the developer uses either the high-level OutputCache directive or the lower level Response.Cache API.
In addition to simply adding HTTP cache headers to the generated document, ASP.NET goes a step further. If specified through either the high-level or low-level APIs, the response that is generated for a given request can be added to the ASP.NET application Cache. On the next request, rather than re-executing the entire page, the already generated response is grabbed from the Cache and sent to the caller. This skips all the work necessary to generate the response and can lead to a two to three times performance increase for those pages.
Below is a sample ASP.NET page that uses the high-level output cache directive. This will not only add the appropriate HTTP cache headers, but also will also save the response and reuse it across multiple requests, until the specified time limit is reached.
<%@ OutputCache Duration="10" VaryByParam="none"%> <Script runat="server"> Public Sub Page_Load() CreatedStamp.InnerHtml = Now() ExpiresStamp.InnerHtml = Now().AddSeconds(10) End Sub </Script> <Font size="6">Output caching for 10 seconds... <HR size="1"> Output Cache created: <Font color="red"><B id="CreatedStamp" runat="server"> </B></Font> <BR> Output Cache expires: <Font color="red"><B id="ExpiresStamp" runat="server"> </B></Font> </Font>
In the above sample page, the high-level
OutputCache directive is used along with its two required parameters:
VaryByParam. Essentially this instructs the following:
- The generated response will have the HTTP cache header: Cache-Control: public (meaning everyone can cache it) along with the HTTP header: Expires: (time of request + value of duration).
- The generated response is to be added to the ASP.NET Cache and will remain valid for 10 seconds (value of duration) or until the file itself is changed. Any requests for this page within the 10-second time frame will receive the cached version.
If you run this page, you'll notice that on the first request the current time will be displayed, but subsequent requests for the next 10 seconds will be served from the Cache and the time value won't be updated. After 10 seconds, the next request executes the page again and we start over.
Before we move on to discuss the functionality of the ASP.NET Output Cache, let's take a look at what's happening behind the scenes.
Behind the Scenes
Figure 1 below provides an overview of what happens when a page is served by ASP.NET and output cached.
Figure 1. Overview of what occurs when a page is served by ASP.NET and output cached
Here's what we're looking at in the diagram above:
Requests not using Caching:
Request comes in for an ASP.NET Page. The request is given to the ASP.NET Http Runtime and is processed through the registered modules and directed to the page handler.
The page handler is unable to find a precompiled version of the page class on disk, so it must grab the file and give it to the ASP.NET engine for parsing.
The ASP.NET Page engine parses the file and generates a page class.
The page class is compiled into a .NET assembly and cached on the disk.
An instance of the requested page's class is created.
The response generated from the requested page class is sent back to the original caller.
Subsequent requests not using Output Caching:
Request comes in for an ASP.NET Page.
The page handler finds a cached version of the class and creates an instance.
The response generated from the requested page class is sent back to the caller.
Requests using Output Caching:
Request comes in for an ASP.NET Page.
Steps 1-6 are repeated for the first request.
The generated response is added to the ASP.NET Cache.
The response is sent back to the caller.
Subsequent requests using Output Caching
Request comes in for an ASP.NET Page.
When the request is processed by the Http Modules, the Cache module is passed through. If the page response is found within the ASP.NET Cache, the response is copied from the Cache API and sent straight back to the caller.
As you can see, ASP.NET is very efficient regardless of whether Output Caching is utilized or not. The possible performance gains are quite obvious. When output caching is used, ASP.NET is doing less work to satisfy a request, thus allowing the response to be served faster.
The features we described to use ASP.NET Output Caching are powerful. You can envision an ASP.NET page that connects to a database and executes an expensive SQL query, such as computing total product sales for the year. If ASP.NET output caching was applied to this page, the cost of creating the response would be paid once by the first request. Subsequent requests could be satisfied from the Cache, without paying the cost of executing the entire page or the expensive SQL call until the response expired due to a time, key, or file dependency or if the underlying file changes.
The features described, using the high-level ASP.NET Output Cache directives, sound compelling, but there's a whole lot more functionality available. Let's look at how you can practically apply ASP.NET Output Caching to your application.
Applying ASP.NET Output Caching
As you may have noticed from the title, this article is the first installment of a two-part series. For this installment, we're going to focus on the high-level APIs used for Page Output Caching, and in the second article we'll dig a bit deeper and look at the lower level APIs found in Response.Cache.
The high-level Cache API is accessible using page directives, which come in the following format:
<%@ [DirectiveName] [attribute]=[value] [attribute]=[value] ... %>
The directive for output caching is
OutputCache, and there are 2 attribute/value pairs required whenever we use this directive:
- Duration: The amount of time in seconds in which a page may be output cached. This time period is calculated by taking the time of the request, if the page is not already in the cache, and adding the number of seconds. If you're already familiar with the ASP.NET Cache API, you'll recognize that we're simply creating a new Cache entry with a time dependency within the Cache. The duration value also sets the HTTP expires header.
- VaryByParam: This attribute allows us to control how many cached versions of the page should be created based on name/value pairs sent through HTTP POST/GET. The default value is None. None implies that only one version of the page is added to the Cache, and all HTTP GET/POST parameters are simply ignored. The opposite of the None value is *. The asterisk implies that all name/value pairs passed in are to be used to create cached versions of the page. The granularity can be controlled, however, by naming parameters (multiple parameter names are separated using semi-colons).
Let's talk about VaryByParam a little more.
Varying by HTTP GET/POST Parameters Using VaryByParam
Let's say you're building a Web application that is capable of displaying the weather forecast for the 50 United States. This application is completely encapsulated in one page—Default.aspx.
Default.aspx presents the user with a drop-down list of states. A state is selected from the drop-down list and the value of that drop-down list is sent back to default.aspx. For example, State=WA or State=TX. For the sake of simplicity, let's assume we're using HTTP GET to send data. Once an item is selected, a request is sent to the server as ...default.aspx?State=WA.
Note The default behavior here would be to use HTTP POST, and the name/value pairs would be sent in the HTTP Headers and we would not see them.
Let's assume that the forecast is only updated once a day and there is a cost to generate the forecast. For example, a complex SQL query and so on.
We could add the following directive to default.aspx and every single page, default.aspx, default.aspx?State=WA, default.aspx?State=TX, and so on, would be cache independently of one another for two hours:
<%@ OutputCache Duration="10800" VaryByParam="State" %>
Pretty easy, huh?
Now, let's say the page gets a bit more complex. Rather than only selecting the state, we also add functionality for a specific city. To output cache pages based on the state and city parameter, we would change the directive to:
<%@ OutputCache Duration="10800" VaryByParam="State;City" %>
VaryByParam allows us to cache multiple versions of the same page based on parameters sent through HTTP GET/POST. It is recommended that you steer clear of using *, as this can potentially fill the output cache with pages that aren't frequently accessed. Remember, the more specific you make the VaryByParam the more frequently you can serve from the Cache. For example, when only specifying a state, we will have 51 version of the page in the Cache (50 states + the version with no parameters). Once the city parameter is added, and let's assume an average of 15 cities per-state, we're suddenly looking at 751 cache pages. The Cache will do the right thing if we become memory constrained by the cache—that is, evict items automatically—but we should understand what is happening when we use VaryByParam.
Varying by HTTP Headers Using VaryByHeader
Now that we've built our weather page to support both states and cities and output caching the document for three hours, the requirement comes to us that the page must also support multiple languages. For example, international tourists/visitors want to use this page before coming to the United States.
The language used to display text in the page is determined by the Accept- Language HTTP header value—FR-FR for French, or EN-EN for English, and so on.
We can author the application logic for the page to correctly display content based on the language, and we can then use another high-level output cache directive to instruct the output cache-to-cache different version of the page based on the HTTP header Accept-Language:
<%@ OutputCache Duration="10800" VaryByParam="State;City" VaryByHeader="Accept-Language" %>
Again, this can be a semicolon-separated list, similar to VaryByParam. The output cache will create multiple cached version of the page based on parameters and HTTP headers.
Varying by Browser Type Using VaryByCustom
Our weather application has been running great for several months now. Being a weather forecasting company, we like patterns, and one of the patterns we notice is that a majority of our customers are using Internet Explorer and a number of customers are also attempting to access our site with a mobile device, such as a WML enabled phone. Armed with this knowledge, we decide to extend our application to better support Internet Explorer, and perhaps the Smart Navigation feature, so we add code to our application to support WML devices.
We now want to vary the output-cached page based on the type of browser that is accessing our application. To support this functionality, we add the following attribute/value pair to our directive:
<%@ OutputCache Duration="10800" VaryByParam="State;City" VaryByHeader="Accept-Language" VaryByCustom="browser" %>
Note, the VaryByCustom attribute has some special uses, hence the reason it's not called VaryByBrowser. More on that in a second.
When VaryByCustom is set to
browser, the cached response will vary on the major browser version number of the caller—Internet Explorer 6 or Netscape Navigator 4, and so on. Different browser types will receive different cached versions of the page.
As you can see, the factored design of the Page Output Cache API allows for easy use with the high-level APIs. But, as you can also readily tell, you are constrained by VaryByParam, VaryByHeader, and VaryByCustom. What if there are other scenarios you want to address? Such as varying the output based on whether the user is a customer of the site or not? For example, if you build a stock quote site you may want to provide NASDAQ level 2 quotes to paying customers, but delayed quotes (perhaps as old a 20 minutes) to non-paying subscribers. We can still achieve this level of functionality. This is where VaryByCustom gets its name. We can vary by a custom setting.
Custom Variation: Overriding VaryByCustom
Above we stated that the default value for VaryByCustom is
browser, but what happens if we put in another value, like
test? Nothing happens, that is until we write the code to handle this new value.
The implementation of the code that handles VaryByCustom is marked such that it can be overridden. The override must occur within either an HttpModule or within global.asax, and making the addition in global.asax is the easiest way. Below is the function in Visual Basic® .NET to accomplish this override:
Overrides Public Function GetVaryByCustomString(context As HttpContext, _ arg As String) As String [implementation details] End Function
When the output cache executes the code to process VaryByParam, the overridden version of GetVaryByCustomString is used, rather than the internal version of the function. This function accepts two parameters:
- Context: An instance of HttpContext for the current request.
- Arg: The string value set in VaryByCustom="[value]".
The context provides access to all the intrinsic class instances—Session, Cache, Request, Response, and so on. The arg list can be any number of values, but how those values are separated, (that is, using a semicolon), is up to the developer that needs to split the string.
Based on the arguments, switch logic is used (see below for an example of varying by minor version of the browser type vs. the major version variation accomplished with VaryByCustom="browser") and a string is returned that can uniquely identify a request bound for a given cache instance of the page:
Overrides Public Function GetVaryByCustomString(context As HttpContext, _ arg As String) As String If (arg = "minorversion") Then return "Version=" + context.Request.Browser.MinorVersion.ToString() End If End Function
ASP.NET page output caching makes your Web application faster. Using the page output cache is made easy by the factored design into both high-level and low-level APIs.
In the next Nothin' but ASP.NET column, we'll continue our discussion of ASP.NET output caching and focus on using the lower-level Cache APIs found in Reponse.Cache. While the high-level APIs address 90% of output caching scenarios, the lower-level APIs allow us to do some interesting things too.
If you're interested in learning more about HTTP caching, such as how various application behave when HTTP messages include HTTP Cache headers and so on, I would definitely recommend Web Proxy Servers by Ari Luotonen (Prentice Hall, ISBN: 0136806120).
A great resource for all things related to ASP.NET is the official ASP.NET Web site, located at http://www.asp.net/. You'll find all kinds of useful information, controls to download, customer lists, hosting companies, and so on at the ASP.NET Web site.
Rob Howard is a program manager for ASP.NET on the .NET Frameworks team. He spends whatever spare time he has either with his family or fly fishing in Eastern Washington.