Patterns for Long Running Activities in Windows Workflow Foundation

Long Running Activity

Have you ever noticed that long running operations are difficult to implement in software? Did you ever think about why that is the case?

Some kinds of long running operations are obviously difficult in software such as requesting approval of another person. You can't implement this scenario in a single method. You have to send a message to the other person, save some state information and then wait for a response message. There are other software scenarios that have the same kinds of issues but are not so obviously long running such as a database operation or a web service call. Database operations will typically complete in a few tens of milliseconds, but sometimes they are much slower and can take many seconds. Synchronous web service calls are much less predictable and can take even longer.

Say that any software operation that takes more than 1/10th of a second is over that long running boundary and should be treated the same way as the human approval request scenario. There may be a more optimal length of time and it may be quite different for a particular scenario. But let's just pick 1/10th of a second for this discussion. For this long running software we must send a message, save some state and then listen for a response message. I spent some time in the early days of COM+ trying to make long running operations (> 1/10th second) work well within the COM+ host. COM+ was designed for short running components and often database operations in large systems would cause problems. The problem that I faced in these situations was that COM+ would allocate a thread from its pool to a component and when its pool was exhausted it would queue up additional component requests. The thread pool was tuned to a size that with all threads executing at 100% it would be optimal use of the CPU. However with database operations and nowadays with web service calls we find most of the elapsed time on the thread is spent waiting for an IO response with an idle CPU. In this case COM+ would become unresponsive to all new requests. A software developer noticing this problem would go in and increase the number of threads, but this merely shifted the bottleneck to other things like thread contention, RAM use and IO contention. Note that these same optimized thread pool size concepts are used in ASP.NET for page execution.

Windows Workflow Foundation (WF) provides a solution to long running software by bringing the send / save / listen pattern from human workflow systems to a lower level where it can be used in place of COM+ components for any middle tier business logic. In this pattern we could consider any long running operation such as database operations and web service calls as suitable for implementing with the send, save state, listen response pattern. WF provides the state, messaging, and queue infrastructure for listening for responses to do this so long as we follow one of the simple activity patterns. WF is not designed to allow arbitrary long running code to execute within an Activity. Instead a WF Activity must be short running, just as in a COM+ component. The two reasons for this are 1) because of the limited number of threads available for the same reasons in COM+ and 2) because the assumption that a workflow instance only executes on one thread means that locking and marshalling is not required. This latter assumption avoids the significant performance penalty from marshalling of data from one thread to another.

So that's the pattern for long running work: send / save / listen. But it applies only to asynchronous messaging which is easy for calling work within your application that is under your control as you can just change the work code. But synchronous messaging to other systems will still require the blocking IO of an open TCP connection to monitor for the response. Most basic web services and database operations fall into this category. You have two options for this and one is easy and the other is hard. 1) Start a worker thread to make the synchronous call. This is known as thread donation since the thread is idle on the CPU but necessary to listen for the IO. The key is that this is a new thread from the thread pool and not the one executing the short running workflow. 2) Create an asynchronous messaging proxy for your synchronous web services. This would involve a single service on your host that sends and holds open TCP connections for all synchronous messaging. In this instance only one thread is required to service many open TCP connections.

Summary and Other Activity Patterns

In summary, here are my patterns for building activities executing short and long running work with WF.

  • Short – Short running work that takes less than the 1/10th of a second can simply be coded directly in the Execute method of an activity.
  • Long – For long running work on the local CPU a worker thread from the CLR thread pool should be used. The Activity Execute method registers a queue for a response message and starts the worker thread. I have a simple sample of this that you can use.
  • Send – Sending an asynchronous message can be implemented in the same way as Short.
  • Receive – An asynchronous receive means registering a queue to wait for a message. In WF the WorkflowRuntime will have a service attached as an endpoint for the messaging. When a message is received the workflow instance waiting on the queue is loaded into memory and passed the message. HandleExternalEvent Activity and Receive Activity are examples of this that ship with WF.
  • Synchronous Send/Receive – Unless you know that all your database operations or web service calls will be short running, the recommended method here is thread donation which makes this the same as Long. The long running operation is actually your database operation or call to a web service.
  • Composite – An activity built primarily for workflow modeling that has a primary purpose of showing the relationships between activities contained within it. The execute code for this is relatively easy. I think some good samples of these types of activities would also be very valuable but I don't have one to share right now.

Once you have mastered these six patterns you can build pretty much any activity.

A Final Comment on Database Access

One other thought that I will leave you with. How long can a database operation take before it affects the scalability / threading / performance of the middle tier business logic. I've suggested that anything longer than 1/10th of a second should be handled with the long running pattern. But what if you have database operations that take longer than that? What's a general solution for database execution where the synchronous delay and the resulting IO bottleneck becomes an issue? Should database operations be managed through the long running pattern? If all your database operations are very short then you would not want the additional overhead in code, thread marshalling and execution time of using a separate thread.

Long Running Activity Sample Link: