URL Fragments and Redirects

I’ve worked on the Internet Explorer team for six+ years, and on web sites for a decade longer, so I’m understandably excited when I come across a browser behavior I can’t explain. Last week, I encountered such a mystery, and it took me quite a while to figure out what was going on.

Background

Facebook tends to use URL Fragments in their URLs. For instance, a car dealer’s website includes a link to their Facebook page thusly:

http://www.facebook.com/\#\!/MBofWhitePlains?sk=app\_1922299908089

The Fragment component of the URL is the end of the URL from the hash symbol (#) onward. URL Fragments are never sent to the server in the HTTP request— only JavaScript running in the page can see them. So, when your browser loads the URL above, the server sees only “http://www.facebook.com” in the request, and it’s the responsibility of JavaScript in the returned page to examine the URL to find the extra information in the Fragment.

Clicking on the link will go to the specified URL:

image

…and then script on the page will redirect you to a final page which contains the “MBofWhitePlains” identifier in the URL path, clearing out the URL Fragment.

image

Now, you may have heard that Facebook now offers an opt-in choice to always use HTTPS when loading Facebook:

image

If you set this option, Facebook will immediately return a HTTP/302 redirect for a HTTPS page if your browser ever requests a page using HTTP.

That’s a problem for this scenario: because the URL Fragment is never sent to the server, the server sends your browser a redirect to https://www.facebook.com, with no URL Fragment specified. Hence, when the redirected page is loaded, the URL Fragment is blank, and you’re left on the Facebook homepage.

Now, this made perfect sense to me—a simple limitation of the way Facebook is using URLs.

Except for one thing…

While Safari and Internet Explorer both behave as expected, Firefox, Chrome, and Opera were somehow landing on the HTTPS version of the car dealership’s Facebook page—not the homepage. This was a truly surprising outcome, and I spent a ton of time ensuring that the different behavior wasn’t related to Facebook performing User-Agent sniffing and returning different responses, or anything of the sort. It turns out that the code was the same, but the browser behavior was very different.

Peeking behind the curtain

After much debugging, I realized that Firefox, Chrome, and Opera will re-attach a URL Fragment after a HTTP/3xx redirection has taken place, even though that fragment was not present in the URL specified by the Location header on the redirection response. So

In Chrome/Opera/Firefox

Loading http://foo/#SomeInfo –> HTTP/302 to Location: http://bar => final URL of http://bar/#SomeInfo

In Internet Explorer and Safari

Loading http://foo/#SomeInfo –> HTTP/302 to Location: http://bar => final URL of http://bar/

Update: Internet Explorer 10 now preserves the fragment when loading a redirected resource, matching other browsers and the updated standards documents.

Here’s a simple test page: https://www.fiddler2.com/test/redir/fragment/ demonstrating this behavior:

clip_image002

Interestingly, Chrome, Firefox, and Opera reattach the fragment information even in a cross-domain redirect, and even when redirect from HTTPS to HTTP.

I wasn’t able to find anything in the HTML5 specification calling for this behavior:

The HTTP specification (RFC2616 and the active HTTPBIS revision) doesn’t specify proper behavior either, noting only that the behavior when the Location header itself contains a URL Fragment is not defined:

Note: This specification does not define precedence rules for the case where the original URI, as navigated to by the user agent, and the Location header field value both contain fragment identifiers. Thus be aware that including fragment identifiers might inconvenience anyone relying on the semantics of the original URI's fragment identifier.

…although almost all browsers appear to respect a URL Fragment specified on the redirect response. Specifically, if both the original URI and the redirect Location specify a fragment-- Internet Explorer, Chrome, Firefox, and Safari will use the Fragment component from the Location header. Opera 11.01 will instead keep the Fragment component from the original URL; they only use the Fragment component from the Location header if the original URL didn't contain a fragment at all. Opera 11.11 changed that behavior to match Chrome and Firefox.

Interesting stuff.

-Eric

Update: Internet Explorer 10 now preserves the fragment when loading a redirected resource, matching other browsers and the updated standards documents.

Update-to-the-Update: Internet Explorer 10 and IE11 behave differently than other browsers when there's no fragment on the first URL, there is on the first 302, and there's none on a second 302. (Test case)