Hiding Files in Plain Sight

Earlier in the week Vicki Davis posted a list of interesting sites on her blog that included one titled “How to hide files in JPEG pictures.” It’s an interesting little scheme that basically involved hiding one file (a compressed ZIP or RAR file) at the end of a JPEG file. This is not encryption by any means. It could be considered a form of steganography though. I think this is an interesting discussion opener with students. Many students are fascinated by the ideas of codes, ciphers and other forms of secret communication. Perhaps now that the AP CS exam is out of the way this might be useful for some classes.There are a couple of lines of discussion I see with this example.

The first is of course Steganography and one can discuss the various forms that takes as a general interest topic. More specifically to computer science though is the idea of how data is stored in files. What are the parts of a file and how does the system know what to do with it? Windows and some other operating systems use the file extension as an indication of what sort of program or operating system function should act on the file. Other operating systems use file header data. I believe that the Mac OS does this or at least did at one time. What are the pros and cons of each?

And there is an opening to discussing what is inside a file. Perhaps even opening a file as an ASCII text file. PDF files are interesting as a lot of the header data is human readable. Or maybe getting a binary dump/print out of the header for a file. What is human readable? Can students determine patterns that let them figure out what sort of application created different files? Of course a lot of files will be pretty much undecipherable without technical documentation but it might be useful and interesting to speculate (and research) on what sorts of header information files do keep.

One could also discuss how this sort of data hiding might be done more efficiently or perhaps with a greater sense of security. Some compression tools allow password protection. How good is that protection? A lot to think about.

Any way it got me thinking and I hope it gets someone else thinking. Now I think I am going to write a new data dump program just for the fun of it. Haven’t done that in years but this has given me some ideas. More on that later.

Late edit: Found a link on Twitter today do an article on doing steganography in Python from Georgia Tech's Media Computation class.