Zip Your Data

Using the Zip Classes in the J# Class Libraries to Compress Files and Data with C#

Ianier Munoz

Code download available at:ZipCompression.exe(150 KB)

This article assumes you're familiar with C# and Windows Forms

Level of Difficulty123

SUMMARY

Zip compression lets you save space and network bandwidth when storing files or sending them over the wire. In addition, you don't lose the directory structure of folders you Zip, which makes it a pretty useful compression scheme. The C# language doesn't have any classes that let you manipulate Zip files, but since .NET-targeted languages can share class implementations, and J# exposes classes in the java.util.zip namespace, you can get to those classes in your C# code. This article explains how to use the Microsoft J# class libraries to create an application in C# that compresses and decompresses Zip files. It also shows other unique parts of the J# runtime you can use from any .NET-compliant language to save some coding.

Contents

Zip Files and C#
The Solution
SharpZip
Enumerating Zip Entries
Decompressing Zip Files
Creating and Modifying Zip Files
Low-level Zip Compression
Other J# Goodies
Application Deployment
Conclusion

Zip is a popular standard for data transfer and storage because it saves both disk space and network bandwidth. Typical text and database files can be compressed to as little as 10 percent of their original size. Even though binary files don't compress as well, 50 percent compression is often achieved.

An additional advantage of Zip files is that a single file can contain multiple files while preserving the directory structure. This allows you to send a full directory tree attached to an e-mail message and have the recipient recover the original file structure.

The Zip data format is open and not subject to patents or other legal issues. Developers are free to create applications that manipulate Zip files and to use the low-level Zip compression algorithms for temporarily reducing the size of their own custom data. The authors of the Zip data specification made the compression and decompression algorithms available to developers in a library named zlib (http://www.gzip.org/zlib). This library was adopted by the Java platform in version 1.1 of the Java Development Kit (JDK) to form the basis of the Java Archive (JAR) file format, so since JDK version 1.1, the standard Java language APIs include the necessary classes for manipulating Zip files. You can find these classes under the java.util.zip namespace.

Zip Files and C#

I wanted to use Zip compression in my applications written in C#. Unfortunately, the Microsoft® .NET Framework does not currently include any classes for manipulating Zip files. I did, however, find several products related to Zip compression. For example, #ziplib (formerly NZipLib, http://www.icsharpcode.net/OpenSource/SharpZipLib/default.asp) is a port of zlib library to C#. Its license allows developers to include this library in commercial, closed-source applications. However, as MSDN Magazine goes to press, #ziplib is in its prerelease state (version 0.31).

Another solution would be to use the unmanaged zlib as a Windows® DLL and write the necessary Interop wrappers for it, but since compression involves passing around a significant amount of data during each function call, coding Interop wrappers for optimum performance would be a difficult process. Although other libraries are available, they are not free.

The Solution

The .NET Framework was designed with language interoperability in mind. Any managed component that follows certain rules can be used correctly from any .NET-compliant programming language that implements the necessary functionality. The set of rules and language features required for interoperability is known as the Common Language Specification (CLS).

All .NET language compilers implemented by Microsoft are CLS-compliant, including Microsoft Visual J#™ .NET—a development tool for Java-language developers who want to build applications and services on the Microsoft .NET Framework. (Visual J# .NET has been independently developed by Microsoft. It is neither endorsed nor approved by Sun Microsystems, Inc.) This is why you can use .NET Framework classes in Windows Forms and ASP.NET applications in J#.

As you will see later in this article, some of the classes exposed by the J# runtime are not actually CLS-compliant, but you can still gain access to most of the J# classes from other languages to use specific features that are not implemented by the .NET Framework. Since J# implements the JDK version 1.1.4, it is no surprise that the java.util.zip namespace is available to developers through the J# runtime. In the next section of this article I will present an application written in C# that uses the java.util.zip classes to compress and decompress Zip files for saving space locally and bandwidth over the wire.

All the sample code in this article was developed with Microsoft Visual Studio 2002 and the J# runtime version 1.0 (see the link at the top of this article).

SharpZip

I wrote SharpZip, one of the sample applications accompanying this article, in C#. It's a simplified utility for handling Zip files that allows you to create Zips or open existing ones to extract, append, and delete files (see Figure 1).

Figure 1 SharpZip App

Figure 1** SharpZip App **

Before looking into the code, you need to ensure that the J# runtime is properly installed on your system. You don't need to install the full Visual J# .NET product. You can download and install only the J# 1.0 redistributable package, which is available at http://msdn.microsoft.com/vjsharp/.

The java.util.zip namespace is implemented in the vjslib.dll assembly. This assembly can be found in the C:\WINNT\Microsoft Visual JSharp .NET\Framework\v1.0.4205\ directory (you will need to replace WINNT with your actual Windows directory).

When you include a reference to vjslib.dll in your project, you can start using the J# namespaces from your code and navigate through the JDK namespaces with the Object Browser (see Figure 2). The important classes are java.util.zip.ZipFile, java.util.zip.ZipEntry, and java.util.zip.ZipOutputStream. These classes, shown in Figure 3, let you manipulate Zip files at the file level.

Figure 3 ZipFile, ZipEntry, and ZipOutputStream Classes

ZipFile Description
close() Closes the Zip file
entries() Returns an enumeration of the Zip file entries
getEntry(String name) Returns the zip file entry for the specified name, or null if not found
getInputStream(ZipEntry entry) Returns an input stream for reading the contents of the specified Zip file entry
getName() Returns the path name of the Zip file
size() Returns the number of entries in the Zip file
ZipEntry Description
clone() Returns a copy of this entry
getComment() Returns the comment string for the entry, or null if none
getCompressedSize() Returns the size of the compressed entry data, or -1 if not known
getCrc() Returns the CRC-32 checksum of the uncompressed entry data, or -1 if not known
getExtra() Returns the extra field data for the entry, or null if none
getMethod() Returns the compression method of the entry, or -1 if not specified
getName() Returns the name of the entry
getSize() Returns the uncompressed size of the entry data, or -1 if not known
getTime() Returns the modification time of the entry, or -1 if not specified
hashCode() Returns the hash code value for this entry
isDirectory() Returns true if this is a directory entry
setComment(String comment) Sets the optional comment string for the entry
setCompressedSize(long csize) Sets the size of the compressed entry data
setCrc(long crc) Sets the CRC-32 checksum of the uncompressed entry data
setExtra(byte[] extra) Sets the optional extra field data for the entry
setMethod(int method) Sets the compression method for the entry
setSize(long size) Sets the uncompressed size of the entry data
setTime(long time) Sets the modification time of the entry
toString() Returns a string representation of the Zip entry
ZipOutputStream Description
close() Closes the Zip output stream as well as the stream being filtered
closeEntry() Closes the current Zip entry and positions the stream for writing the next entry
finish() Finishes writing the contents of the Zip output stream without closing the underlying stream
putNextEntry(ZipEntry e) Begins writing a new Zip file entry and positions the stream to the start of the entry data
setComment(String comment) Sets the Zip file comment
setLevel(int level) Sets the compression level for subsequent entries which are DEFLATED
setMethod(int method) Sets the default compression method for subsequent entries
write(byte[] b, int off, int len) Writes an array of bytes to the current Zip entry data

Figure 2 Namespaces in the Object Browser

Figure 2** Namespaces in the Object Browser **

When using the methods outlined in this article, the names might look strange to you because the naming convention that Java uses for identifiers other than classes and interfaces is different from those used in C#. In Java, namespace and method names are written using lower camel case, where the first letter is lowercase and the rest of the words are initial-capitalized, as in "nextElement." However, I'm sure you'll get the hang of it.

Enumerating Zip Entries

The entries method of the java.util.zip.ZipFile class returns an object that implements the java.util.Enumeration interface. The application then steps through the enumeration to retrieve instances of ZipEntry representing individual entries in the Zip file. The ZipEntry class exposes all the necessary information such as file name, compression method, time stamp, original and compressed size, and so on (see Figure 4).

Figure 4 Enumerating Zip Entries

private void DisplayEntries() { zipListView.BeginUpdate(); try { zipListView.Items.Clear(); foreach(ZipEntry entry in new EnumerationAdapter(new EnumerationMethod(CurrentFile.entries))) { if (!entry.isDirectory()) { string name = entry.getName(); ListViewItem item = new ListViewItem(Path.GetFileName(name)); item.SubItems.Add(entry.getSize().ToString()); item.SubItems.Add(entry.getCompressedSize().ToString()); item.SubItems.Add(Path.GetDirectoryName(name)); item.Tag = entry; zipListView.Items.Add(item); } } } finally { zipListView.EndUpdate(); } }

Please note that although the java.util.Enumeration interface is similar to the System.Collections.IEnumerator interface, Java enumerators advance to the next element when you retrieve the current object by calling nextElement, while .NET enumerators advance when you check for the availability of more elements in the MoveNext call. Another important difference is that the Enumeration interface does not provide a method for restarting the traversal.

One advantage of .NET enumerators is that you can access the current element multiple times. On the other hand, Java enumerators allow you to check multiple times for completion, but this is not very useful under most circumstances. Both Java and .NET enumerators are well designed in that they prevent you from forgetting to advance to the next element inside the enumeration loop.

I decided to write a class for wrapping Java enumerators so I could use the C# foreach statement with them. I named this class EnumerationAdapter. The Reset method is emulated by again calling the method that returns the Java enumerator. To do this, the wrapper class constructor takes as a parameter a delegate to the java.util.Enumeration interface instead of the java.util.Enumeration interface itself.

Decompressing Zip Files

The first thing the SharpZip application does when extracting files is to prompt the user for a directory where the files should be created. You may have noticed that the application shows the Browse for Folder dialog box. I was tempted to use the System.Windows.Forms.Design.FolderNameEditor.FolderBrowser class, but the documentation states that this type supports the .NET Framework infrastructure and isn't intended to be used directly, so I used the Shell32 object through COM Interop by importing the Microsoft Shell Controls and Automation type library.

Extracting the original files from a Zip file (unzipping) is a simple operation: just call getInputStream on the ZipFile object, passing the entry for which you want to get the compressed file. The getInputStream method will produce an InputStream from which you can read the content of the archived entry.

The ExtractZipFile helper function does the job for you. Directories are stored in Zip files using separate entries, but the file name on each entry contains the directory information as well, so ExtractZipFile ignores directory entries and extracts the necessary path information from the file names.

To save individual files to disk, just write to a file the content of the InputStream that corresponds to the entry of interest. This time I decided not to wrap Java streams around a custom System.IO.Stream class because the java.io namespace has fairly good support for streams. In particular, java.io.FileOutputStream allows you to create a file to which you can copy the desired entry.

The CopyStream helper function in Figure 5 copies the contents of a java.io.InputStream object to a java.io.OutputStream object. This helper function is used by other parts of the SharpZip application as well. You should note, though, that this example doesn't check whether the output files already exist before overwriting them. You may want to prompt the user by asking whether the file should be overwritten or not.

Figure 5 ZipUtils.cs

using System; using System.Collections; using java.util; using java.util.zip; namespace CsZip { public delegate Enumeration EnumerationMethod(); /// <summary> /// Wraps java enumerators /// </summary> public class EnumerationAdapter : IEnumerable { private class EnumerationWrapper : IEnumerator { private EnumerationMethod m_Method; private Enumeration m_Wrapped; private object m_Current; public EnumerationWrapper(EnumerationMethod method) { m_Method = method; } // IEnumerator public object Current { get { return m_Current; } } public void Reset() { m_Wrapped = m_Method(); if (m_Wrapped == null) throw new InvalidOperationException(); } public bool MoveNext() { if (m_Wrapped == null) Reset(); bool Result = m_Wrapped.hasMoreElements(); if (Result) m_Current = m_Wrapped.nextElement(); return Result; } } private EnumerationMethod m_Method; public EnumerationAdapter(EnumerationMethod method) { if (method == null) throw new ArgumentException(); m_Method = method; } // IEnumerable public IEnumerator GetEnumerator() { return new EnumerationWrapper(m_Method); } } public delegate bool FilterEntryMethod(ZipEntry e); /// <summary> /// Zip stream utils /// </summary> public class ZipUtils { public static void CopyStream(java.io.InputStream from, java.io.OutputStream to) { sbyte[] buffer = new sbyte[8192]; int got; while ((got = from.read(buffer, 0, buffer.Length)) > 0) to.write(buffer, 0, got); } public static void ExtractZipFile(ZipFile file, string path, FilterEntryMethod filter) { foreach(ZipEntry entry in new EnumerationAdapter(new EnumerationMethod(file.entries))) { if (!entry.isDirectory()) { if ((filter == null || filter(entry))) { java.io.InputStream s = file.getInputStream(entry); try { string fname = System.IO.Path.GetFileName(entry.getName()); string newpath = System.IO.Path.Combine(path, System.IO.Path.GetDirectoryName(entry.getName())); System.IO.Directory.CreateDirectory(newpath); java.io.FileOutputStream dest = new java.io.FileOutputStream(System.IO.Path.Combine( newpath, fname)); try { CopyStream(s, dest); } finally { dest.close(); } } finally { s.close(); } } } } } public static ZipFile CreateEmptyZipFile(string fileName) { new ZipOutputStream(new java.io.FileOutputStream(fileName)).close(); return new ZipFile(fileName); } public static ZipFile UpdateZipFile(ZipFile file, FilterEntryMethod filter, string[] newFiles) { string prev = file.getName(); string tmp = System.IO.Path.GetTempFileName(); ZipOutputStream to = new ZipOutputStream(new java.io.FileOutputStream(tmp)); try { CopyEntries(file, to, filter); // add entries here if (newFiles != null) { foreach(string f in newFiles) { ZipEntry z = new ZipEntry(f.Remove(0, System.IO.Path.GetPathRoot(f).Length)); z.setMethod(ZipEntry.DEFLATED); to.putNextEntry(z); try { java.io.FileInputStream s = new java.io.FileInputStream(f); try { CopyStream(s, to); } finally { s.close(); } } finally { to.closeEntry(); } } } } finally { to.close(); } file.close(); // now replace the old file with the new one System.IO.File.Copy(tmp, prev, true); System.IO.File.Delete(tmp); return new ZipFile(prev); } public static void CopyEntries(ZipFile from, ZipOutputStream to) { CopyEntries(from, to, null); } public static void CopyEntries(ZipFile from, ZipOutputStream to, FilterEntryMethod filter) { foreach(ZipEntry entry in new EnumerationAdapter(new EnumerationMethod(from.entries))) { if (filter == null || filter(entry)) { java.io.InputStream s = from.getInputStream(entry); try { to.putNextEntry(entry); try { CopyStream(s, to); } finally { to.closeEntry(); } } finally { s.close(); } } } } } }

Also note that there is no support for password-protected files. You could create your own encryption mechanism using the classes in the System.Security.Cryptography namespace. If you do this, be aware that the resulting file will not be compatible with standard Zip utilities such as WinZip.

Creating and Modifying Zip Files

The java.util.zip.ZipOutputStream class allows you to compress data and write the result to an underlying java.io.OutputStream object. The SharpZip application is intended to work with files, so it writes the compressed data to a new java.io.FileOutputStream object, but you could easily derive your own class from java.io.OutputStream or use one of the standard classes to write the compressed data directly to the network or other storage mediums.

The CreateEmptyZipFile helper function creates a Zip file and closes it immediately. The result is an empty Zip file with no entries in it. Appending or deleting items is not so simple because the java.util.zip package does not provide random access to Zip files. For deleting files, you should copy the entries you want to preserve to a new Zip file. For adding files, you should copy all entries to a new Zip file and then append the new entries. Copying an entry involves decompressing the entry from the source file as I've described and compressing it again to the destination file.

Create a new instance of ZipEntry for each of the files that you want to add and call setMethod on the entry to set the compression method to use. The supported methods are ZipEntry.DEFLATED, which compresses data using the deflate algorithm, and ZipEntry.STORED, which stores data without applying any compression. Then call ZipOutputStream.putNextEntry, passing in the new entry, and write its data by calling the write method on the ZipOutputStream object. When you are finished with the current entry, call ZipOutputStream.closeEntry and proceed to the next entry.

The UpdateZipFile function in Figure 5 implements both updating and deleting by calling a delegate for each entry so that you can choose which entries should be copied to a temporary file. Finally, the new entries are added to the Zip file.

Low-level Zip Compression

You can use the java.util.zip classes to compress not only files but your application data as well. To illustrate this, I created a pair of functions to compress and decompress a string using the java.util.zip.Deflater and java.util.zip.Inflater classes.

The compression function creates an instance of the java.util.zip.Deflater class. A parameter in the constructor defines the level of compression that is desired. Next I call the Deflater.setInput class, passing the data to compress as an array of signed bytes (sbyte), and then I call Deflater.finish.

Please note that, in contrast to C#, the byte data type in Java is signed—there is no unsigned byte data type in Java. This is why all methods of the J# runtime that deal with buffers take arrays of sbytes as parameters.

Fortunately, the com.ms.vjsharp.struct namespace includes the JavaStructMarshalHelper class which, among other things, helps you when performing array conversions. The CompressString function calls the convertToByteArray method to convert a string to an array of signed bytes. To get the actual compressed bits I just keep calling Deflater.deflate until Deflater.finished returns true to signal that all the input data has been consumed. I collect the resulting data inside the compression loop using an instance of java.io.ByteArrayOutputStream. As a general rule, it is preferable to use the JDK classes when handling Java types in C#. It's the best way to avoid repeatedly converting arrays between sbyte and byte.

The code for decompressing the string looks very similar to the code used for compression. This time you create an instance of the java.util.zip.Inflater class and call the setInput method, passing in the compressed data. The decompression loop continually calls Inflater.inflate until Inflate.finished becomes true, signaling that all the input data has been decompressed. Finally, call JavaStructMarshalHelper.convertToString to convert the array of unsigned bytes to the string to be returned by the function.

The CsZipLL sample application (LL stands for low level) creates a long string and compresses it down to about half its size. You could use these functions, for example, to code a SOAP extension to reduce the network bandwidth required by your Web Services.

Other J# Goodies

Although this article focuses on handling Zip files, this principle can be applied to other areas where the J# runtime libraries provide functionality not available from the .NET Framework standard assemblies.

Since J# offers developers a migration path to the .NET Framework for their Visual J++® projects, J# also implements many of the Visual J++-specific features, such as J/Direct®. J/Direct is a technology that allows Java language programs to call native Windows code. As is the case in Visual J++, the com.ms.win32 namespace in J# gives you access to most of the Windows API functions, data types, and constants.

The User32, Kernel32, and Gdi32 classes contain the core of the Win32® API functions. The constants are defined as static fields in a number of interfaces named winx, where x is the first letter of the constant. For example, the SW_SHOW flag for the ShowWindow API can be found in the com.ms.win32.wins interface.

For an interface to be CLS-compliant it must not contain fields, and the com.ms.win32.winx interfaces fail this test. Because C# does not allow fields in interfaces, neither IntelliSense® nor the C# compiler can see these constants, but you can still access these fields using Reflection, as shown here:

private int GetWin32IntConstant(string name) { System.Reflection.Assembly asm = System.Reflection.Assembly.GetAssembly(typeof(com.ms.win32.wina)); Type t = asm.GetType("com.ms.win32.win" + char.ToLower(name[0]), true); System.Reflection.FieldInfo info = t.GetField(name); return int.Parse(info.GetValue(null).ToString()); }

Retrieving Windows API constants using this technique is slow, so you should be careful when using this approach. Another problem is that since constants are not resolved at compile time you get runtime errors every time you misspell them. In any case, having most of the Windows API already declared in a .NET assembly may save you a lot of work. The SharpZip sample program, for example, displays the system icon associated with the extension of each file. To do this, the code calls the SHGetFileInfo API defined in the com.ms.win32.Shell32 interface to obtain the handle of the icon (see Figure 6).

Figure 6 Calling SHGetFileInfo

private Icon IconFromFileType(string path) { Icon Result = null; com.ms.win32.SHFILEINFO info = new com.ms.win32.SHFILEINFO(); int flags = GetWin32IntConstant("SHGFI_ICON") | GetWin32IntConstant("SHGFI_USEFILEATTRIBUTES") | GetWin32IntConstant("SHGFI_SMALLICON"); if (com.ms.win32.Shell32.SHGetFileInfo(path, 0, info, 1024 /* struct size */, flags) != 0) { // the Icon does not own the handle... Icon tmp = Icon.FromHandle(new IntPtr(info.hIcon)); try { // ...that's why we copy it Result = new Icon(tmp, tmp.Size); tmp.Dispose(); } finally { com.ms.win32.User32.DestroyIcon(info.hIcon); } } return Result; }

Note that when you create a System.Drawing.Icon object from a handle, the new Icon does not own the handle. This means that you must free the associated resources by calling the DestroyIcon API. Since I didn't want to store the icon handle for the lifetime of the Icon object, I chose to make a copy of the object.Icons created by using the copy constructor on their handle.

Although the com.ms.win32 namespace is huge, you should be aware that it does not include every single Windows API function and data structure. For example, one notable omission from the com.ms.win32.Shell32 interface is the SHBrowseForFolder API, which would have allowed us to display the "Browse for Folder" dialog box without using the Microsoft Shell Controls and Automation COM library.

Also note that handling callbacks is somewhat complicated due to the fact that the Java language does not support delegates. For every callback type, an abstract class that defines the function prototype is provided. You must derive from this class to implement the code that handles the callback and then pass the API call an instance of this class (see Figure 7). Another minor difficulty related to the Java language is that parameters passed by reference are declared as arrays, but this only affects only the code that calls the functions, not the underlying functionality.

Figure 7 Passing Callbacks to the Win32 API

using System; using com.ms.win32; namespace CsWinApi { class MyEnumProc : WNDENUMPROC { public override bool wndenumproc(int hwnd, int lParam) { java.lang.StringBuffer str = new java.lang.StringBuffer(255); User32.GetWindowText(hwnd, str, str.capacity()); string txt = str.ToString(); if (txt != string.Empty) Console.WriteLine(txt); return true; } } class AppMain { [STAThread] static void Main(string[] args) { User32.EnumWindows(new MyEnumProc(), 0); Console.ReadLine(); } } }

Finally, some API calls are poorly translated. One example is waveOutOpen (defined in the Winmm class). The dwCallback parameter is used in C++ to pass an event handle, a window handle, a thread ID, or a callback function, depending on the value of the fdwOpen parameter. Since the J/Direct wrapper declares the dwCallback parameter as Int32 and there is no way to typecast a callback (delegate) to Int32, you must use another notification mechanism, such as event handle, window handle, or thread id.

There are some other interesting things in the core J# package. For example, the java.math.BigDecimal and java.math.BigIntegers classes allow you to manipulate arbitrarily large numbers, which could be extremely helpful if you are writing applications that deal with cryptographic algorithms or scientific calculations.

The CsMath sample project shows how to use java.math.BigDecimal to calculate Pi with an arbitrary number of digits after the decimal point using Machin's formula. To make the code more readable I wrapped the java.math.BigDecimal in my own BigDecimal class and defined the most commonly used operators.

Application Deployment

Applications that use this technique require both the J# runtime and the .NET Framework on the target computer. Just as with the .NET Framework, Microsoft provides a redistributable package that you can deploy along with your application setup.

Microsoft has indicated continued support for J# for desktop operating systems. However, there is currently no support for the .NET Compact Framework in J#, so you cannot apply the techniques described in this article to applications targeting Smart Devices. Copying the assemblies to your local project directory will not work because the J# runtime assemblies rely heavily on native calls. You can, however, fully exploit the J# runtime for Web applications that use Mobile Web Controls.

Conclusion

The J# runtime includes many useful classes that you can use from other languages in the .NET Framework. Some of these classes allow you to handle Zip files, perform high-precision mathematical calculations, or call the Windows API. Although most of this functionality can be achieved by using third-party libraries, the J# runtime is fully supported by Microsoft, and it is free!

For background information see:
http://msdn.microsoft.com/vjsharp/
What is the Common Language Specification?

Ianier Munoz is a software architect and analyst at Dokumenta, a consulting firm based in Luxembourg. He is also the author of Chronotron and other popular software. You can reach him at http://www.chronotron.com.