There’s never magic, but plenty of butterfly effects
I’ve always enjoyed magic shows, but I’ve never attempted to understand how the tricks are performed, since that would take all of the fun out of them. In contrast, if I see a web browser demonstrating seemingly magical behavior or misbehavior, I find it hard to sleep until I figure out what’s going on.
Earlier this month, a new phenomenon was reported that defied easy explanation. A customer posted a comment on this blog noting that her ASP.NET-based web application isn’t working correctly over HTTPS in Internet Explorer 11 on Windows 8.1. With that combination, clicking a button on the site doesn’t correctly load the next page -- doing so instead refreshes the current page. Putting the site in Compatibility View or the Trusted Zone didn’t help at all. Her site works fine in other browsers on Windows 8.1 and it works fine in IE11 on Windows 7. It works fine when accessed over HTTP rather than HTTPS. More interestingly, it works fine even on W8.1 when Fiddler is running, or when the site is in her Intranet zone.
This was a strange combination indeed, and figuring out the hidden X-factor took a bit of sleuthing. At first, I was a bit stymied because Fiddler is my “go-to” debugger, and if it was causing the change in behavior, I couldn’t use it.
Or could I? The fact that Fiddler changes behavior is an important clue. As I noted over in “Help! Running Fiddler fixes my app???” there are only a small set of changes that occur when Fiddler is used to monitor traffic. My first guess was that the HTTPS cipher in use changed when Fiddler was in HTTPS decryption mode (because Fiddler defaults to a max of TLS/1.0 while IE11 enables TLS/1.2 by default). After disabling HTTPS Decryption in Fiddler, IE used TLS1.2 through to the site but to my great surprise, the problem still vanished.
Curiouser and curiouser.
Fortunately, I recently added support in Fiddler for importing raw packet captures collected using Microsoft Network Monitor or WireShark. While HTTPS traffic is encrypted and cannot be seen, I could use Fiddler to easily compare the HTTPS handshakes between the packet captures of the working (Fiddler or Intranet) and non-working scenarios.
And I found something quite interesting indeed.
Here there be Butterflies
The last few years have seen a number of new and improved attacks against the algorithms used in securing HTTPS traffic, and as a consequence, last November, the IE team announced a change to reduce the use of RC4 for HTTPS connections.
By default, Internet Explorer 11 on Windows 7 offers the following ciphers in its first ClientHello TLS message:
Windows 7 IE11 Default
By default, IE11 omits the RC4 ciphers from the offered ciphers when running on Windows 8.1:
Windows 8.1 IE11 Default
However, if you are behind a proxy server (like Fiddler) or if the target site is in your Intranet Zone (not the Internet or Trusted Zones), then the RC4 ciphers are again offered in the ClientHello:
Windows 7/8.1 IE11 Behind Proxy or Intranet
Same as Win7 IE11 default, above.
This “offer RC4 to proxies or private networks” behavior was introduced for legacy compatibility reasons; because the IE team cannot “see” your Intranet sites or proxy server, they can’t workaround any compatibility problems occurring on a private network by using a Compatibility View list update or similar mechanism.
And now we’ve found the clearest difference between the working and non-working cases: when the browser offers RC4, the site works, and when it doesn’t, the site doesn’t work.
Now that I understood the difference in behavior, I could use Fiddler to explore it more fully. Using a registry key, I manually disabled the use of RC4 by all components that use SChannel (e.g. WinINET, WinHTTP, System.NET etc). Save the following as NoRC4.reg and import it using the Registry Editor:
Windows Registry Editor Version 5.00
Note that this change is likely to break applications and at this time probably should only be used for testing purposes.
With RC4 disabled in this way, I could easily reproduce the problem even with Fiddler running or decrypting traffic. Using Fiddler’s Compare feature, I first confirmed that the server was returning exactly the same bytes to the client as it downloaded the page. I then confirmed that the client was sending exactly the same HTTP POST body when the user clicked the button that triggered the ASP.NET postback.
So, the same content, sent over connections secured using different ciphers, behaved differently.
Fortunately, in playing with the site, a few other clues came to light. I noticed that the HTTP response’s Server header claimed that the site was served by nginx, an interesting result for a site using ASP.NET. I asked the developer about her topology, and she explained that that they have an nginx/1.2.6 server acting as a HTTPS frontend (using OpenSSL 1.0.0f 4 Jan 2012) it is distributing requests to IIS servers on the backend.
While debugging the site under Fiddler, I noticed that occasionally one of its requests would cause a malformed response that looked like an attempt to send a HTTP/400 Bad Request status to the client.
And now I had a theory. The POST sent by the client had a 2kb body that contained the ViewState and, at the end, a parameter which was used to generate the URL to which the browser should be navigated. If I did nothing, or set a breakpoint and manually stripped out that parameter, the server would simply return the same page again. If, however, I padded the POST body with a trailing string of dummy (but well-formed) content, the server would correctly send the user to the next page.
My current theory is that the outdated nginx/OpenSSL combination in use here is failing to read the entire POST body from the browser client and, as a consequence, it’s passing an incomplete POST to the backend server. The backend server, upon receiving incomplete postback data, simply returns the same page again. The nginx instance then attempts to parse the prematurely truncated end of the first POST’s body as the beginning of a subsequent request on the connection, determines that it isn’t valid HTTP traffic, and returns a HTTP/400. However, when the postback is padded with dummy data, only some portion of that dummy data is truncated, and the IIS server sees the parameters it needs and sends the proper response to the client.
Now, what could cause the nginx/OpenSSL combo here to have this problem with the TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA cipher combination? My guess is that it’s a beast of a bug. More specifically, at the end of 2011, an attack named BEAST targeted CBC cipher algorithms, and as a consequence, browsers implemented a technique called 1/n-1 record splitting, in which HTTPS traffic sent over CBC-based algorithms is split into multiple records. My current guess is that the older version of the frontend’s software isn’t properly collecting all of the records which represent the POST body, and as a consequence it’s sending incomplete data to the backend server.
I’ve suggested that the site owner update the frontend to use the very latest nginx and OpenSSL to see whether the problem is resolved.
In a decade of looking at browsers, I’ve yet to find any magic, but there are plenty of cases where minor quirks, accumulated over decades and across billions of scenarios, lead to butterfly effects that are difficult to explain.
Internet Explorer MVP
PS: Good references on controlling SChannel via the registry: blogs.technet.com/.../speaking-in-ciphers-and-other-enigmatic-tongues.aspx and http://support.microsoft.com/kb/245030/en-us