Content-Encoding != Content-Type

RFC 2616 for HTTP 1.1 specifies how web servers must indicate encoding transformations using the Content-Encoding header. Although on the surface, Content-Encoding (e.g., gzip, deflate, compress) and Content-Type (e.g., x-application/x-gzip) sound similar, they are, in fact, two distinct pieces of information. Whereas servers use Content-Type to specify the data type of the entity body, which can be useful for client applications that want to open the content with the appropriate application, Content-Encoding is used solely to specify any additional encoding done by the server before the content was transmitted to the client. Although the HTTP RFC outlines these rules pretty clearly, some web sites respond with "gzip" as the Content-Encoding even though the server has not gzipped the content.

Our testing has shown this problem to be limited to some sites that serve Unix/Linux style "tarball" files. Tarballs are gzip compressed archives files. By setting the Content-Encoding header to "gzip" on a tarball, the server is specifying that it has additionally gzipped the gzipped file. This, of course, is unlikely but not impossible or non-compliant.

Therein lies the problem. A server responding with content-encoding, such as "gzip," is specifying the necessary mechanism that the client needs in order to decompress the content. If the server did not actually encode the content as specified, then the client's decompression would fail.

Here is a potentially over-simplified example:

  1. Windows Vista Networking Rocks!
  2. Jvaqbjf Ivfgn Argjbexvat Ebpxf!

If I mistakenly claim that string a) has been encoded using the simple ROT-13 obfuscation scheme when in actuality it has not, then the decoded message b) will be very different than the intended message.

Since the AI engine for WinINet isn't yet ready for production (joke), we try and work-around these non-compliant server responses but that isn't the right long-term approach. The fix and the ask, is for web server, extension and application authors to test their servers to see if they exhibit this behavior and if so fix their implementations before we remove our client-side hacks.

To test your server for compliance, issue a simple HTTP 1.1 request, including the "Accept-Encoding: gzip" for a .gz file and inspect the headers. If you see Content-Encoding: x-gzip or gzip, then the server is either gzip-encoding the already gzipped file or it is misstating that the content has been encoded by the server before transmission and therefore perpetuating client HTTP stacks, such as WinINet, having to absorb and hide this bad server behavior.

-Billy Anders