BizTalk Patterns part 1 – Asynchronous Aggregation
Developing a BizTalk Server solution can be challenging, and especially complex for those who are unfamiliar with it. Developing with BizTalk Server, like any software development effort is like playing chess. There are some great opening moves in chess, and there are some great patterns out there to start a solution. Besides being an outstanding communications tool, design patterns help make the design process faster. This allows solution providers to take the time to concentrate on the business implementation. More importantly, patterns help formalize the design to make it reusable. Reusability not only applies to the components themselves, but also to the stages the design must go through to morph itself into the final solution. The ability to apply a patterned repeatable solution is worth the little time spent learning formal patterns, or to even formalize your own. This entry looks at how architectural design considerations associated to BizTalk Server regarding messaging and orchestration can be applied using patterns, across all industries. The aim is to provide a technical deep-dive using BizTalk Server anchored examples to demonstrate best practices and patterns regarding parallel processing, and correlation.
The blog entry http://blogs.msdn.com/b/quocbui/archive/2009/10/16/biztalk-patterns-the-parallel-shape.aspx examines a classic example of parallel execution. It is consists of de-batching a large message file into smaller files, apply child business process executions on each of the smaller files, and then aggregating all the results into a single large result. Each subset (small file) represents an independent unit of work that does not rely or relate to other subsets, except that it belongs to the same batch (large message). Eventually each child process (the independent unit of work) will return a result message that will be collected with other results, into a collated response message to be leveraged for further processing.
This pattern is a relatively common requirement by clients and systems that interact with BizTalk Server, where a batch is sent in and they require a batch response with the success/failure result for each message. With this pattern one can still leverage the multithreaded nature of BizTalk's batch disassembly while still conforming to requirement of the client.
Let’s closely examine how this example works. First, it uses several capabilities of BizTalk that are natively provided – some that are well documented, and some that are not. A relatively under-utilized feature - that BizTalk Server natively provides, is the ability to de-batch a message (file splitting) from its Receive pipeline. BizTalk Server can take a large message with any number of child records (which are not dependent with one another, except for the fact that they all belong to the same batch), and split out the child records to become independent messages for a child business process to act on. This child business process can be anything – such as a BizTalk Orchestration function, or a Windows Workflow.
To learn how to split a large file (also known as as de-batching) into smaller files, please refer to the BizTalk Server 2006 samples – Split File Pipeline example. Even though the sample is based on BizTalk Server 2006 (R1) – it remains relevant for BizTalk Server 2006 R2, BizTalk Server 2009, and BizTalk Server 2010.
Note: There are several examples on how to de-batch messages, such as de-batching from SQL Server, as communicated on Rahul Garg’s blog on de-batching.
After the large file has been split into smaller child record messages, child business processes can simultaneously work on each of these smaller files, which will eventually create child result messages. These child result messages can then be picked up by BizTalk Server and published to the BizTalk MessageBox database.
|Once the messages are placed into the MessageBox database, they can be moved into a temporary repository, such as a SQL Server database table.|
|Messages collected into the database repository, can be collated, and even sorted by a particular identifier, such as the parent (the original large batched message) GUID.|
To aggregate the results back to the same batch, correlation has to be used. BizTalk Orchestration provides the ability to correlate. Aggregation with Orchestration is relatively easy to develop. Using Orchestration, however, may require more hardware resources (memory, CPU) than necessary. One Orchestration instantiation will be required per batch (so if there’s 1000 large batch messages, then there will be 1000 Orchestration instances). Orchestrations linger until the last result message has arrived (or it can time out from a planned exception handling). There is an alternative way to aggregate, without Orchestration, but correlation is still necessary and required to collate all the results into the same batch. The trick is to correlate the result messages without Orchestration.
Let’s examine how correlation without Orchestration can be achieved.
Paolo Salvatori, in his first blog entry http://blogs.msdn.com/paolos/archive/2008/07/21/msdn-blogs.aspx describes how to correlate without Orchestration, by using a two-way receive port. This is important to note, because the two-way receive port provides key information that can be leveraged, by promoting them to the context property bag.
These key properties are
Correlation without Orchestration is as easy as just promoting the right information to the context property bag. Of course, this information needs to be persisted, maintained, and returned with the result messages. However, this method only works with a two-way receive port. What about from a one-way receive port, such as from a file directory or MSMQ?
That is still possible because the de-batching (file splitting) mechanism of BizTalk Server provides a different set of compound key properties that can be used.
|The Sequence number of the child message, and the Boolean isLast can be used to determined the total number of records within the large message. For example, if there are 30 messages in a large message, the 30th message will have a sequence number of 30 (sequencing starts at 1), and isLast Boolean value of True.|
The final step is to aggregate all the result messages and ensure that they’re correctly collated into the same batch that their original source messages from from. The UID that was used for correlation, is the parent identifier that can be used to group these result messages into. An external database table can be used to temporarily store the incoming messages, and a single SQL Stored Procedure can be used to extrapolate these messages into a single output file.
- A result message is received by a FILE Receive Port
- This result message is stored in a SQL Server table. This is achieved by being routed to a SQL Send Port where the original message is wrapped into a sort of envelope by a map which uses a custom XSLT. The outbound message is an Updategram which calls a stored procedure to insert the message into the table.
- A SQL receive port calls a stored procedure to retrieve the records of a certain type from the custom table. This operation generates a single document.
- This document is routed to a File Send Port where the message is eventually transformed into another format.
This method of using a SQL aggregator was communicated in http://geekswithblogs.net/asmith/archive/2005/06/06/42281.aspx
Note that the XSLT can be eventually replaced by a custom pipeline component or directly by an envelope (in this case the send pipeline needs to contain the Xml Assembler component).
|Sample code is provided courtesy of Ji Young Lee, of Microsoft (South) Korea. Uncompress the ZIP file with the directory structure intact on the C:\ root directory (i.e.: c:\QuocProj). : http://code.msdn.microsoft.com/AsynchAggregation|
Acknowledgements to: Thiago Almeida (Datacom), Werner Van Huffel (Microsoft), Ji Young Lee (Microsoft), Paolo Salvatori (Microsoft)