Downloading ZIP-Based Formats

More and more file formats are based on the ZIP format. The Open Packaging Conventions use ZIP as a base format, and that means frameworks like .NET’s System.IO.Packaging also generate files that are valid ZIP files. The Office 2007+ formats are ZIP-based, and more personally, Fiddler’s SAZ Format is ZIP-based.

Unfortunately, this trend toward ZIP-based packaging incurs a problem when dealing with file types that are not registered in the server’s configuration. When sending unknown types, a simple server will typically send a Content-Type: application/octet-stream header, indicating very generically that the download in question is of a binary type without providing specific information. Internet Explorer’s MIME-sniffing code kicks in and says, hey, I see that you’ve provided a generic type. Lemme check that content and see if I know what it is.

Now, the sniff for ZIP formats is dead-simple: Does the file start with 0x50 x4B (aka ‘PK’)? If so, then it’s probably a ZIP file. And in the case of ZIP-based formats, the browser’s technically right, but behaviorally wrong. If the server didn’t specify a Filename in a Content-Disposition: attachment header, Internet Explorer will promptly rename the file away from its original extension to .ZIP. The browser will then consult with Windows and determine that the .ZIP file should be opened by a MIME Handler.

For instance, downloading from http://webdbg.com/dl/saz.saz results in the following modal prompt:

image

If you choose Open, the MIME Handler is invoked and shows the guts of the ZIP file:

image

If you choose Save, the file will be saved to your downloads folder as a .ZIP. This is generally not what you want.

As a mitigation for this problem, Internet Explorer 9 included an exemption list for the most popular ZIP-based formats of 2010; downloads whose URLs bore the following extensions are not renamed:

.accdt; .crtx; .docm; .docx; .dotm; .dotx; .gcsx; .glox; .gqsx; .potm; .potx; .ppam; .ppsm; .ppsx; .pptm; .pptx; .sldx; .thmx; .vdw; .xlam; .xlsb; .xlsm; .xlsx; .xltm; .xltx; .zipx

To avoid this problem for all ZIP-based types, servers have two options:

  1. Send a specific MIME-type identifying the file’s type
  2. Use a Content-Disposition header to specify the filename

For instance, when the server is reconfigured to send a Content-Type: application/x-fiddler-session-archive MIME, the user gets the expected Download Manager notification, and the file extension is untouched:

image

The changing web suggests that it probably makes sense to get out of the business of sniffing ZIP files, as such sniffing is likely now causing more problems than it solves.

-Eric