Adventures in Packaging - Episode 1

Technorati Tags: OPC,OPC basics,Open Packaging Conventions,ISO 29500,ECMA 376

If you're developer who creates or accesses data files (and who doesn't?), then this blog might be one for you to keep an eye on.

Welcome to the first installment of Microsoft Packaging Team blog!

The Packaging team works in the Microsoft Windows Division to provide programming APIs that support the Open Packaging Conventions standard.  Open Packaging Conventions ("OPC") is a new file technology documented by the ISO/IEC 29500-2 and ECMA 376-2 standards.

Several members of the team worked on the original System.IO.Packaging APIs released for managed-code in .NET 3.0.  We're particularly excited now with the upcoming release of Windows 7 which will include new native-code Win32 Packaging APIs that will ship as part of the operating system!  Microsoft is strongly embracing open standards and interoperability - the commitment to incorporate OPC as an integral element of the operating system is another step in that direction.

Open Packacing ConventionsPerhaps an initial question might be, "what is OPC and what makes it so compelling"?   Rather than being a specific file format, OPC is a container-file technology that's designed to create file formats based on a flexible open framework.  OPC integrates elements of Zip, XML, and the Web into an open industry standard that makes it easier to organize, store, and transport application data.  OPC is the core file technology for many of the new file formats supported by Microsoft products.  This new generation of OPC-based files include Office 12 versions of Word (.docx), Excel (.xlsx), and PowerPoint (.pptx), along with XPS (.xps), Semblio (.semblio), plus a growing number of other new Microsoft and third-party applications such as Autodesk AutoCAD (.dwfx) and Siemens UGS (.jtx).  While each of these file formats share OPC as a foundation, the data content contained in each differs depending on the specific format.  Before going too much further, perhaps a couple of words on terminology.

In packaging terms...

    • A "package" corresponds to a "Zip archive".
    • A "part" corresponds to a "file" (i.e. "a data stream") stored within the Zip.

ZipPackages
In using Zip as its physical container, all OPC-based file formats are, in fact, Zip files.  You can simply append ".zip" to any OPC file (package) to open and examine its contents in Windows Explorer or your favorite Zip utility. This makes packages a great choice for organizing multiple application data streams into a single file that’s portable and easy to access.  It’s important to note, however, that while all OPC files are Zip files, the reverse is not necessarily true: not all Zip files are OPC files.  OPC adds two requirements to a Zip file:

    1. The names of all of the parts (files) stored in an OPC package must be URI-compliant.
    2. The package must contain a “[Content_Types].xml” file.

URI Part Names
The first requirement*, URI-compliant part names, enables potential web-access to the parts stored in a package when the package is located on a web server.  In situations where an original filename is not URI-compliant, the filename is typically "percent-encoded" to a URI-compliant form.  For example, a part with the filename "my file.txt" would be percent-encoded as "my%20file.txt" (you've probably seen this in many of the URLs on your Web browser).
*Re. ISO 29500-2, Section 9.1.1 “Part Names”.

The [Content_Types].xml Part
The second requirement**, a "[Content_Types].xml" part, is used so that the content of all the parts in the package are clearly and accurately defined, not only today but into the future.  Since many three or four characters filename extensions have multiple meanings, Content_Types is used to accurately define part content through the use of MIME-style media types.  The markup within a Content_Types part is fairly simple and contains just two basic types of elements: "Default" and "Override" elements.

    • Default: associates a generic file "Extension" to a specified "ContentType".
    • Override: associates a specific "PartName" to a specified "ContentType"
      (overrides any Default extension association).

The following is an example of a [Content_Types].xml part:

<?xml version="1.0" encoding="utf-8" ?>
<Types xmlns="**https://schemas.openxmlformats.org/package/2006/content-types**"\>
  <Default Extension="htm" ContentType="text/html" />
  <Default Extension="css" ContentType="text/css" />
  <Default Extension="png" ContentType="image/png" />
  <Default Extension="jpg" ContentType="image/jpeg" />
  <Default Extension="mp3" ContentType="audio/mpeg3" />
  <Default Extension="xml" ContentType="application/xml" />
  <Override PartName=" /docProps/core.xml"
   ContentType="application/vnd.openxmlformats-package.core-properties+xml" />
</Types>

When using the Win32 native-code Packaging APIs or .NET managed-code Packaging APIs the Content_Types file is created and managed automatically.  If you're creating a Zip package on your own you'll also need to include a [Content_Types].xml file that contains the markup to define the content types for all of the parts contained in the package. As shown in the above example this is fairly simple to do.
**Re. ISO 29500-2, Section 9.1.2 “Content Types”.

The Adventure Continues...
These are some basics to help get you started but there are many other additional services that OPC provides.  A goal of this blog will be to highlight uses and features of both the Win32 native-code Packaging and .NET managed-code System.IO.Packaging APIs for creating, organizing, and accessing information stored in OPC files.  We think you'll find the Open Packaging Conventions an exceptionally flexible standard for managing application data; and in particular, the packaging APIs an indispensable tool to help you take advantage of this new file technology - more to come in following episodes...

Thanks for listening,
Jack

PS: Here are some links for more information and related topics about Packaging: