Processing PDFs (or anything else!) in BizTalk Orchestrations

So you're probably aware that you can pass virtually any file (*.exe, *.dll, *.xls) through BizTalk Server's messaging components, but did you also know that you can pass just about anything through an Orchestration process as well?

If you want to pass, say, a Microsoft Word document through the BizTalk Messaging Engine, you'd simply set up a receive location that grabs the file, make sure you use the pass-through pipeline, then create a send port that drops the file back out, also using the pass-through pipeline and a subscription to the receive port. This works because everything passes through BizTalk Server as a byte stream, therefore allowing any binary object to move through unmolested by parsers. This scenario works great if you are moving docs between SharePoint libraries, or yanking off inbound email attachments (BizTalk Server 2006 only) and throwing them to a file share.

The case I'm demonstrating here is the scenario when you don't want to just pass the non-XML file through the Message Bus, but also apply some business process via Orchestration. The first step was to draw out the Orchestration process and create the necessary Messages and Variables. The goal of my orchestration is to take in a PDF file, look at the context information, and based on the inbound file name, route it to one of three locations. If it's any sort of training manual I move it to one folder, if it's a company report I drop it to another location, and if it's anything else, it rests in one final spot.

So my process looks like this:

You'll notice a few things. First of all, the message type of the inbound document is of System.Xml.XmlDocument. This message type doesn't actually require XML content. Rather, it's treated within BizTalk Server as a grab bag of any file format. Because a message traveling through an orchestration isn't automatically loaded up into the DOM but rather remains a stream (unless you have Distinguished Fields which then cause selected parts of the data to be loaded into memory), there's no problem with accepting anything into that Message. Try it out, it's neat stuff. Of course remember that I haven't shown anything that lets you get at the CONTENT of that message, as you only have access to the context properties unless you have helper components that can rip open the message.

You may assume that I'm using regular expressions to parse out the receive file name. You are correct. So in each decision shape, I use the static IsMatch member of the RegEx object to look for a key phrase in the file name. See below:

After building all this out, we deploy it. When creating the ports, remember to keep the pipelines as pass-through (unless you write a custom pipeline component that is used to add key data to the message context on the way in). You see my active configuration below:

So there you go. While BizTalk Server is keenly optimized to take advantage of XML formats, we've also enabled you to pass everything but the kitchen sink through the messaging and process engines. All the more reason you can use BizTalk Server as a hub for all the message-based traffic hurtling through your enterprise.