The Hazards of Browser Quirks, continued

My First Law of Browser Quirks was introduced a while ago: If there’s a way for a site to take dependency on a browser quirk, and break if that quirk is removed, it will happen . The Second Law of Browser Quirks is: If there’s a way for a site to combine a set of browser quirks to yield an entirely unexpected behavior, it will happen.

I was reminded of the Second Law a few weeks ago, when a site developer reported a bizarre behavior: namely, if they visited a page that was sent with Cache-Control: max-age=0 a few times, they’d see that the first HTTP/200 response was correctly cached, and then conditional requests would properly get HTTP/304 Not Modified responses. The developer then deleted that page from the backend, and they’d see a HTTP/404 correctly come back for the next revalidation request. However, if the developer renavigated to the page a few times quickly, the previously-delivered HTTP/200 page would sometimes be “magically resuscitated” from the local cache.

How, they wondered, could that possibly happen?

It was particularly strange because hitting the Refresh button would correctly show the 404 page.

Fortunately, I had just recently updated the Fiddler Caching Inspector, which examines a HTTP response to determine how it will be cached. A quick look at the Inspector led me to realize how this HTTP/404 page had yielded the unexpected result. Beyond specifying a Cache-Control: no-cache HTTP response header, the 404 page contained the following markup:

<meta http-equiv="Expires" content="0" />

Seeing caching directives in HTTP Markup always makes me a bit nervous, and it turns out that this is in fact the root cause of the problem. For historical reasons, Trident will respond to two directives in HTML markup:

<meta http-equiv="Pragma" content="no-cache" />
<meta http-equiv="Expires" content="HTTP-DATE" />

No other directives (e.g. Cache-Control) are supported.

Each of the two supported directives, if encountered, causes Trident to call down to WinINET to adjust the HTTP caching freshness of the cache entry stored under the current URL. For the Pragma directive, if the current URL is HTTPS, Trident will call DeleteURLCacheEntry; if the URL isn’t HTTPS, it will instead call SetUrlCacheEntryInfo to adjust the time at which the document expires. For the Expires directive, the provided Date will simply be used as the new expiration time.

Now, Fiddler’s Caching Inspector points out one more thing: WinINET will never cache HTTP/404 response—it will only modify the cache if it receives a HTTP/200, HTTP/206, or HTTP/3xx redirect status code. Sending a caching directive in the HTTP/404 markup was unnecessary, because IE won’t ever cache that error page. That’s not to say that the META tag does nothing, however-- Trident doesn’t know (or care) that the document was sent with an uncacheable status code, and dutifully updates the expiration info for the existing cache entry—the HTTP/200 document that had originally been stored with in an expired state.

Half of the puzzle is solved, but why does Expires=0 result in the cache entry being deemed fresh? It’s especially surprising because if you send such an Expires directive using a HTTP header, the resulting cache entry will not be fresh.

If WinINET downloads a response with an invalid Expires header (e.g. one that doesn’t contain a valid HTTPDATE value) and no other caching directives, it will mark the document as having expired one hour ago. Trident, however, has no such logic. If you specify an invalid time, Trident grabs the current timestamp and uses that as the expiration. Trident will also use the current timestamp if it encounters the Pragma: no-cache directive. If the user tries to re-navigate to the current document during same exact second that the HTTP/404 was processed, the incorrectly-updated expiration of the existing cache entry will result in it being treated as fresh for that request. If the user hit the Refresh button or F5, the cache would be bypassed and the 404 page would be shown.

The solution for this website was pretty simple—either get rid of the unneeded META entirely (my recommendation) or update it to use a valid, long-ago HTTPDATE to avoid the incorrect update of the freshness info to the current timestamp. For instance, this markup:

<meta http-equiv="Expires" content="Sat, 11 Jun 2011 01:01:01 GMT" />

…results in the cached entry being marked as Expired.

Various bugs have been filed from this investigation, but my advice remains that web developers should do their very best to avoid specifying caching directives in markup.

-Eric Lawrence