Multithreaded Parallelism in Windows Workflow Foundation

Kumar Vadaparty

Director, Global Wealth Management Technology

Merrill Lynch

February 2008

Applies to:

  Microsoft Windows Workflow Foundation

  Microsoft .NET Framework 3.0 or later

Summary: This paper shows how Merrill Lynch leveraged Windows Workflow Foundation for their real-time transactions. Merrill Lynch business requirements need real-time aggregation and orchestration from disparate sources. This requires a multithreaded orchestration-pattern that provides truly concurrent execution of parallel activities. Although this can be built using special activities (Call External Method Activity and Handle External Event Activity of Windows Workflow Foundation), they did not want their developers to build this themselves as it is a fairly sophisticated solution, and requires high-end skills in addition to being potentially implemented incorrectly. So, Merrill Lynch prepared a set of custom activities that abstract this functionality, and transparently provide real-time access to disparate sources each with a separate thread. In addition, Merrill Lynch wanted the solution to be decoupled from specific pay-loads that go on the activities. This paper describes the motivation behind multiple-threaded orchestrations, design challenges of the custom-activities, and finally their implementation. It also discusses how this will impact the compensation aspects of WF. (28 printed pages)



Structure of the Article

Windows Workflow Foundation as a Virtual Machine

Our Motivation for Multithreaded Parallelism

Anatomy of the Interleaved Parallelism of WF

Solution for Multithreaded Parallelism: The View at 30,000 Feet

Multithreaded Parallelism, Step 1: Hook CEMA, HEEA, and Services

Multithreaded Parallelism, Step 2: Hook the Service to the Thread Pool

Abstracting CEMA/HEEA Up to the Service Provider

Problems and a Solution: The Cancelling Event


For More Information




Microsoft introduced Windows Workflow Foundation (WF) as one of the key components in Microsoft .NET Framework 3.0. Unlike workflow application products with which the industry is familiar, WF is not an end-user product, but a programming model, runtime engine, and developer tools that developers and application programmers use in their applications. In this respect, WF behaves like any other Microsoft-supplied software library.

This article does not provide an introduction on how WF works; many other articles that have been published on the MSDN Library, some recently published books, and articles on the Internet do a very good job of that. This article is for advanced audience, and will describe the e-commerce environment at Merrill Lynch and how the company exploits WF. The article will further discuss an extension that Merrill Lynch created for WF to better satisfy requirements at the company. The extension introduces multithreaded parallelism in WF in such a way that WF can better exploit multiple threads on modern computer server hardware with multiple CPUs and multiple cores per CPU. This, in turn, enables more efficiency and much lower response times in workflows with parallel branches.

Structure of the Article

The section that follows, titled "Windows Workflow Foundation as a Virtual Machine," describes WF as a virtual machine, which supports "stitching together back-end calls" in a typical e-commerce application.

The section titled "Our Motivation for Multithreaded Parallelism" describes in detail our motivation for enabling multithreaded parallelism.

The section titled "Anatomy of the Interleaved Parallelism of WF" describes the anatomy of WF parallelism, which we call interleaved parallelism (because it doesn’t use multiple threads), and describes the reasons that it is like that.

The section titled "Solution for Multithreaded Parallelism: The View at 30,000 Feet" outlines the important steps that you must implement to accomplish multithreaded parallelism.

The sections titled "Multithreaded Parallelism, Step 1: Hook CEMA, HEEA, and Services" and "Multithreaded Parallelism, Step 2: Hook the Service to the Thread Pool" describe how to extend WF into a custom Merrill Lynch framework that implements multithreaded parallelism.

The section titled "Abstracting CEMA/HEEA Up to the Service Provider" shows how to abstract away all of the details of the previous two sections, so that application developers can use the extended framework simply.

The section titled "Problems and a Solution: The Cancelling Event" discusses the problems that are involved in failed transactions and how we account for these in the new extended framework. Also, this points out that our solution is not only for retrieval, but also for updates. Create, Read, Update, and Delete (CRUD) operations are fully supportable in this approach.

Finally, the section titled "Conclusion," as its name suggests, concludes the article.

Windows Workflow Foundation as a Virtual Machine

Our motivation for the viewpoint of WF as a virtual machine comes from the first item under References, which defines the very essence of WF: "Fundamentally, a workflow is a tree of domain-specific program statements called activities. The concept of an activity is central to the workflow architecture; you should think of activities as domain-specific opcodes, and workflows as programs that are written in terms of those opcodes."

Fundamentally, WF offers a metaphor for stitching computing activities together. Such activities can encapsulate any function in a large information system—including making calls to existing mainframe services, connecting with third-party service providers such as market-data providers, or retrieving data from local data sources, and so on.

In the language of WF, a workflow is a tree of activities; thus, each activity is a node in the tree and, hence, “node” and “activity” are used interchangeably. For our convenience, we distinguish between two types of activities:

  • Composite nodes/Activities—Activities that participate as “internal” nodes of the workflow tree; for example, composite activities such as sequential and parallel activities, branching composite activity (if-then-else), repetition (while), and so on. The following composite activities ship with .NET Framework 3.0. Those that are marked with an asterisk (*) can participate in operations that require interaction with external services:
    • CompensatableSequenceActivity
    • EventDrivenActivity*
    • EventHandlingScopeActivity*
    • IfElse
    • ListenActivity*
    • ParallelActivity
    • ReplicatorActivity
    • SequenceActivity
    • StateActivity
    • StateFinalizationActivity
    • StateInitializationActivity
    • WhileActivity
  • Native WF-Atomic activities—The following atomic activities ship with .NET Framework 3.0. Those that are marked with an asterisk (*) can participate in operations that require interaction with external services:
    • CallExternalMethodActivity*
    • CodeActivity
    • DelayActivity*
    • HandleExternalEvent*
    • InvokeWebService*
    • InvokeWorkflow*
    • PolicyActivity
    • SetStateActivity
    • WebServiceInputActivity*
    • WebServiceOutputActivity*
  • Domain-specific opcodes—Custom activities that enable accessing back-end systems, market-data providers, credit-card validating programs, weather forecasts, and so on.

The concept of an activity is crucial to workflow. As stated in Box and Shukla, an activity (especially, a custom activity) can be viewed as a domain-specific opcode, and workflows as programs that are written to “glue” those opcodes. The “gluing” comes from the “inner” nodes of a workflow whose leaves are the domain-specific opcodes. These inner nodes are the control nodes (“sequential,” “parallel,” “branching,” “while,” and so on).

The two fundamental control nodes are “sequential” and “parallel” activities. Of interest for this article is the parallel activity. We illustrate first the business motivation of these two, in the context of e-commerce.

Sequential Composite Activities

Let’s consider a simple example of providing a service that involves receiving CreditCardInfo and ProductInfo, calling the credit-card company to validate the card, and placing an order for the book with the back-end system that maintains product inventories.

The following simple sequential (pseudocode) block shows the required logic:

OrderBookService (In: CreditCardInfo, In: ProductInfo, Out: OrderConfirmation)
      ValidateCreditCard(In: CreditCardInfo, Out: CreditCardConfirmation);
      PlaceOrder(In: CreditCardConfirmation, In: ProductInfo, Out: OrderConfirmation)

Figure 1 shows the WF workflow for the same program:

Figure 1. Sequential activity

Parallel Composite Activities

Now, let’s consider the prequel service to the preceding: getting the desired product list. It takes product info (that is, title, ISBN, product code, or whatever is appropriate) and returns a list of products that match (defined appropriately). Furthermore, our vendor is both an owner (of inventory) and a broker who is not very different from

So, our vendor would like to construct the desired product list from three different nationally renowned sources, which we will call Source1, Source2, and Source3, respectively. Because we are obtaining information from distinct sources, we’d like to do this work in parallel, for efficiency. The following pseudocode would capture that logic:

GetProductList (In: desiredProductCriteria, Out: ProductList)
                  getListFromSource1 (In: desiredProductCritieria, Out: ProductList1)
                  getListFromSource2 (In: desiredProductCritieria, Out: ProductList2)
                  getListFromSource3 (In: desiredProductCritieria, Out: ProductList3)
            MakeUnion(In: ProductList1, In:ProductList2, In:ProductList3, Out: ProductList)

A couple of notes are in order:

  • The reason for the separate call to each source is that, although shown in somewhat similar fashion, the true nature of each call to the respective data source might be very different: One might use a Web method, one might use a local SQL call, one might use a proprietary protocol between distributed and host systems, and so on. Also, conversely, when the result is obtained, each call might require a different type of “massaging” to convert to a normalized representation.
  • The parbegin-parend construct is from Dijkstra and has provided inspiration to many concepts in concurrent programming, a topic that has been well studied by a number of computer scientists. A good survey of it appears in Per Brinch Hansen’s classic. The idea behind this construct is that all of the direct children of this node will be assigned separate computing resources and, therefore, will be executed concurrently (that is, in parallel).

Our Motivation for Multithreaded Parallelism

WF is a part of .NET Framework 3.0. At the heart of the framework is the WorkflowRuntime, a class that represents the workflow engine, and exposing services, control, and notifications to the host. Fundamentally, any given episode of a WF instance gets only one instance from the runtime. (See Shukla and Schmidt.) In pages 102–103 of Shukla and Schmidt, the authors Shukla and Schmidt describe in clear terms the advantages of assigning exactly one Common Language Runtime (CLR) thread at a time for a given episode of the WF program instance. One example is that a single-threaded execution model is much easier to code, debug, and test against than a multithreaded model. “In other words, scheduler never performs concurrent dispatch of work items in its work queue.” (See p. 102 of Shukla and Schmidt.) They also go on to explain that this is a pragmatic decision.

WF is designed to work with a variety of patterns of work units. Unlike its predecessor the COM+ runtime, WF is not limited to short-running work. Through simple patterns, it can also work on long-running work items; and with more complex patterns, you can do multithreaded work items. These are the patterns for short- and long-running work. They are based on the design requirement that the CPU thread that is in use by the workflow instance not be held up by the workflow. Using a single thread provides a number of performance and programmability advantages over spawning multiple threads. However, the design patterns for avoiding holding up that thread must be followed.

  1. Short—Short-running work can simply be coded in an activity and executed.
  2. Send—Work that involves messaging to other systems can send a message and return immediately.
  3. Wait—Work that involves receiving a message from another system can register that incoming message with the workflow runtime and return immediately. This activity will be awoken when the message arrives for processing.
  4. Send/Receive—Work that sends and receives a message can be a combination of the previous two.
  5. Long CPU Work—Work that requires significant CPU time can use a new thread from the CLR thread pool and use the preceding message-passing pattern to send the work item to that thread.
  6. Thread Donation—Work that requires a thread to be donated for a blocking API call should use a thread from the thread pool in the same way.

Parallel/concurrent execution of activities that involve external synchronous blocking calls, as described in the aforementioned pattern 6, is a requirement for the middle-tier code in our e-commerce application. This section describes the need and the principles that govern such a need, which motivates the crucial requirement of parallel execution in WF when leveraged for real-time workflows in e-commerce.

As discussed earlier, WF includes a composite (control-node) activity called a parallel activity, as shown in Figure 2. This shape contains any number of child activities on their own sequences, where each sequence is arranged visually on the same horizontal plane. One might assume that all activities will be started in parallel, and each will receive equal and immediate attention from the engine. As we shall see later, this assumption is not quite accurate. In our scenario, long-running thread donation is required, so that the individual tasks are not independently asynchronous.

Figure 2. A WF workflow instance with parallel activity

We agree that, for the other patterns, it is easier to manage one thread per workflow instance. Also, from a developer perspective, giving the ability to launch multithreaded activity execution might raise more problems than produce warranted benefits; the authors themselves say, “hard-to-find bugs that make programs defective.”

However, in the context of our existing e-commerce infrastructure, pattern 6 is required. Our typical synchronous back-end Web service calls take tens of milliseconds to hundreds of milliseconds, with the total user response expected to be within one second. The best way to achieve this is for these back-end calls, where appropriate, to be executed by using multithreaded parallelism. Examples of such back-end calls include checking a credit card with an external agency while determining whether the customer is a premier customer, or, when asked to show inventory, getting information concurrently from third-party book-sellers and its own inventory. In fact, many of our existing middle-tier implementations (written in C#) have hand-coded multithreaded calls to the back end; this required substantial programmer experience, repeated performance and stability testing, and high-maintenance expenses.

Our goal is to have a framework that would reduce this expertise and improve time-to-market; our intent is to leverage WF to accomplish that goal. We strongly believe that WF, with its services and its visual programming model, is an excellent choice of platform for us to use as a middle-tier programming model—eliminating the required hand-coding, and enabling its declarative workflow capabilities. At the same time, however, we feel that we could not sacrifice the multithreaded execution requirement for our solution. Our goal, therefore, is to develop a framework (a library of custom activities) that allows developers to do the following:

  • Leverage WF—Take full advantage of WF’s workflow capabilities, drag-drop functionality, monitoring abilities, compensation facilities, and so on.
  • Attain multithreaded parallelism with custom activities—Achieve multithreaded parallelism by implementing pattern 6 which is currently not available out-of-the-box with WF. However, any known patterns of this implementation require sophisticated use of CEMA (Call External Method Activity) and HEEA (Handle External Event Activity), requiring advanced knowledge of WF.
  • Abstract away intricacies involved in obtaining multithreaded parallelism—We want simple (introductory to intermediate) knowledge of WF rather than advanced knowledge that is required for implementing multithreaded parallelism. That is, we would like not to require talented software developer use of call-external-method and handle-external-event activities along with the CLR thread-pool, as will be described later. This means, we need to abstract away the implementation intricacies.
  • Require no change in threading model of existing back-end calls—We wanted to ensure that none of the existing synchronous calls to the back end are required to change their threading model to take advantage of WF’s parallelism. Not only is this a difficult proposition to implement, given the disparate ways with which back ends are communicated, but it is extremely expensive with testing of such a critical change in programs that have been running for dozens of years.

In other words, we tread a path that is right in the middle: Expand WF with multithreaded execution, but abstract the complexities of multithreading away from the application developer. That is, we want to have the cake and eat it too!

Anatomy of the Interleaved Parallelism of WF

Before looking for multithreaded parallelism, we should first understand the out-of-the box parallel-activity behavior in WF and the reasons for using a single thread.

As stated earlier, an episode of a workflow instance gets exactly one thread (see pp. 102–103 of Shukla and Schmidt). The impact of this decision is visible only when using activities that execute in parallel. The reality is that a single thread is allocated to the instance and will simply start the activities in sequence. It is up to each activity to work with the single thread efficiently, following the patterns that were described earlier in the section titled "Our Motivation for Multithreaded Parallelism." Where activities are written specifically for asynchronous operations, the behavior approximates parallel activities on individual threads, given WF’s unique state model. Where activities are not written specifically for asynchronous operations and keep the CPU thread busy, the behavior is no different from the sequence activity, which executes its children in sequence. This is because there is only one thread available to use.

There are several reasons for this behavior. One reason (as stated in a previous section) is to simplify workflow development. Deadlocks, thread context switching, and data marshalling must be dealt with whenever doing multithreaded development work. In addition, if the host application spools up thousands of workflow instances, each with its own undetermined number of worker threads, you could reach thread saturation on the machine. Another reason is that long-running workflows do not have any knowledge of the host environment into which they are plugged, and persisting long-running multithreaded workflows to a persistence store is much more complex than using the single-threaded model.

Solution for Multithreaded Parallelism: The View at 30,000 Feet

This section describes how we can leverage two atomic activities, CallExternalMethodActivity (CEMA) and HandleExternalEventActivity (HEEA), and the concept of external services to accomplish multithreaded parallelism in WF. Our goal in this section is to keep the explanation at a fairly high level (30,000 feet). Later sections go to successively lower levels, until we land on the terra firma (the code itself).

Note  “External services,” “local services,” and “ExternalDataExchangeServices” are used interchangeably in this text. They are also used interchangeably in the Microsoft SDK WF documentation.

The idea behind using CEMA and HEEA for managing parallel execution is simple. CEMA is intended for calling external methods (as the name implies), and HEEA is used for performing activities when an external event results (as the name implies). In our implementation of multithreaded parallel activities, we use CEMA and HEEA as a complementary pair around the work that needs to be done in parallel, and get that work shifted to a separate thread using an external service as intermediary; this thread is separate from the workflow thread that is allocated to the instance.

Some technical challenges arise. Note that CEMA and HEEA are completely independent activities. A workflow can have many CEMAs but no HEEAs, or, conversely, no CEMAs but many HEEAs. Also, one CEMA does not know much about another HEEA. This required some sophisticated knowledge of CEMA/HEEA and hooking them to external services. Another challenge is ensuring that the work that is done is routed back to the right HEEA activity. All of this requires advanced knowledge in WF; and our goal is to abstract this away, so that a beginner can use the custom-developed parallelism.

  1. Hooking CEMA and HEEA to an external service:
    1. CEMA calls the external service—CEMA helps us in invoking a call to the external service that does the work (that is supposed to be executed in parallel).
    2. HEEA handles the completion event—HEEA receives the event indicating the completion of the work, and goes on to perform the next step in the original workflow.
  2. Hooking the external service to a thread pool—The external service allocates the work to a separate thread and, when completed, uses the WF runtime to notify HEEA that the work is completed.

These steps require the following key operations:

  1. Binding—This involves design-time (code-time) binding between CEMA and a “pointer” to the external service. Similarly, we must put in a design-time binding between HEEA and the same pointer to the external service. Then, we need run-time bindings to be established, which involves workflow instance, CEMA instance, and HEEA instance. This topic is discussed in depth in the section titled "Multithreaded Parallelism, Step 1: Hook CEMA, HEEA, and Services."
  2. External-service registration—We must let the WF runtime know about our specific external service. This requires some type of registry at run time. This topic is discussed in depth in the section titled "Multithreaded Parallelism, Step 1: Hook CEMA, HEEA, and Services."
  3. Multithreaded code—Next, someone has to write the multithreaded code that would accept different pieces of work and assign them to different threads. This is done in the external service. This topic is discussed in depth in the section titled "Multithreaded Parallelism, Step 2: Hook the Service to the Thread Pool."
  4. Least impact to developer—Finally, all of the preceding steps must be “wrapped up” in such a way that a casual developer can use this pattern without having to understand the depth of the sections titled "Multithreaded Parallelism, Step 1" and "Multithreaded Parallelism, Step 2." This topic is discussed in depth in the section titled "Problems and a Solution: The Cancelling Event."

Multithreaded Parallelism, Step 1: Hook CEMA, HEEA, and Services

This section describes how CEMA and HEEA are used, and their internal workings. It describes, in detail, how CEMA and HEEA activities are configured in the WF designer, and how they are bound to custom local services. It also covers the details of CEMA/HEEA correlation and the runtime behavior of the CEMA, HEEA, and local-service interaction.

CEMA is used to invoke services that are not appropriate for use within the context of a running workflow instance. HEEA is used to receive events from an externally completed task. WF does not require these to be used in pairs; but, when they are used in pairs, they provide a powerful “hook” to call external programs. Specifically, they are allowed to call external services with well-defined interfaces. Any services that would block the calling thread for a significant period of time or that would spawn an operation on a separate thread are good candidates for external services.

External services are .NET Framework classes that implement a well-defined, published interface. We will illustrate with the integer-adding service, which implements the well-defined IAddIntegerService interface. At design time, a programmer binds external services to CEMA/HEEA activities by placing their well-defined interfaces in the CEMA/HEEA activity properties. At run time, a service is registered with the WorkflowRuntime during initialization, so that the previously bound WF activities (in our example, CEMA/HEEA) have a service instance from which to consume the services. Thus, the service works like a singleton class with respect to all workflow instances that run within a particular WorkflowRuntime. It will service the needs of all of the runtimes that run workflow instances that contain activities that are bound to the service interface. This is depicted in Figure 3:

Figure 3. Singleton service servicing multiple activities concurrently

Design-Time Binding Between CEMA, HEEA, and the External Service

The CEMA activity is used to invoke a method on a particular service interface. External services advertise themselves by decorating their interfaces with the [ExternalDataExchangeService] attribute (described in detail later). This allows a method of the advertised interface and its associated parameters to be bound to a CEMA instance using the WF designer. Additional attributes are also provided for correlating CEMA/HEEA instances, so that parameters are correctly passed and returned from the correct CEMA instance to the correct HEEA instance (see the following subsection, "Run-Time Binding Between CEMA, HEEA, and External Service Through Correlation").

The HEEA activity is used to receive events from a particular service. Those events could result from the invocation of the service by a particular CEMA instance, or they could be invoked by external applications, like responding to an e-mail or something else. This article focuses on events that result from a CEMA invocation only because we are using CEMA and HEEA as complementary pairs to accomplish multithreaded parallelism.

A concrete implementation of the service interface must be registered with the WorkflowRuntime prior to workflow execution, in order for the bound CEMA and HEEA instances to actually invoke the service at run time. This is depicted at a high level (see Figure 4). The following sections will explore those steps in greater detail. As soon as the CEMA and HEEA are bound to a local-service interface and the service is registered, the workflow can be executed, allowing the CEMA and HEEA to consume the service at run time.

Figure 4. CEMA and HEEA interaction with local service

Run-Time Binding Between CEMA, HEEA, and External Service Through Correlation

Recall that we use CEMA and HEEA as complementary pairs to accomplish the process of “shifting work” to an external thread. To accomplish this, we need a way to tell which CEMA event’s work is correlated to which HEEA event, so that the routing of the completion event to the right HEEA is accomplished correctly. This section describes how this is accomplished.

At any given moment, there could be hundreds of workflow instances being executed by a particular workflow runtime. Some or all of those instances could contain activities that consume the services of a local service through CEMA/HEEA calls. Correlation is the facility that allows a CEMA service call to associate itself with a unique identifier that differentiates it from the hundreds of other running instances. That same unique identifier is, in turn, used to locate the HEEA activity to which the event is subsequently returned. Local services are managed by the WorkflowRuntime, independent of any specific workflow instances. Without correlation, there would be no way for the local service to route calls correctly from a particular CEMA back to events for the corresponding HEEA. The WF correlation facility, described later, provides support for mapping CEMA calls to a local service back to the exact HEEA instance that should receive the resulting event invocation.

CEMA instances are correlated to specific HEEA instances by setting their CorrelationToken property to the same value. That token value is scoped to a specific parent activity and, therefore, must be unique for all activities within that parent. Furthermore, at run time, a correlation identifier must be passed to the external service call by the CEMA and returned by the external event arguments from the ExternalDataExchangeService to HEEA.

As noted earlier, the local service exists outside of the workflow instance. This dictates that workflow-instance correlation is required, because there could be multiple instances of a workflow running simultaneously. We need workflow-instance correlation to identify the specific instance. Therefore, the CEMA must pass the workflow-instance ID out to the local service, and it must be supplied as part of the ExternalDataEventArgs that are passed with the incoming event to the HEEA. The workflow runtime provides support for obtaining the workflow-instance ID from the WorkflowRuntime environment, so explicitly passing the workflow instance in the CEMA call is not required; however, it is required as a property value in the ExternalDataEventArgs that are associated with the returned HEEA event.

The specific external data-service parameters that are used for correlation are designated by the CorrelationInitializer, CorrelationParameter, and CorrelationAlias attributes that are used to decorate the custom external data-exchange service interface that is described in the following subsection, "Service Configuration and Binding." A sample decorated interface is shown in Listing 1. The attributes in the example function are as follows:

  • ExternalDataExchange—Tells the workflow runtime that this is an interface that can be used as an ExternalDataExchangeService. It will be described, in detail, in the sections that follow.
  • CorrelationParameter—Tells the workflow runtime that the id parameter is the outgoing correlation ID.
  • CorrelationAlias—Tells the workflow runtime that the e.Id property of the AddIntegerCompleteEventArgs is an alias for the id correlation ID. This allows the workflow runtime to locate the incoming correlation-ID value on its way back from the ExternalDataExchangeService.
  • CorrelationInitializer—Tells the workflow runtime that the Add() method initializes the correlation parameter id that is called out by the CorrelationParameter attribute.
    public interface IAddIntegerService
        [CorrelationAlias("id", "e.Id")]
        event EventHandler<AddIntegerCompleteEventArgs> AddIntegerComplete;

        void Add(string id, int a, int b);

Listing 1. Sample ExternalDataExchangeService interface

Therefore, the CorrelationInitializer and CorrelationParameter attributes instruct the workflow runtime where to obtain the outgoing correlation-ID value from the CEMA. In this case, it uses the id parameter of the IAddIntegerService.Add() method. The workflow runtime still needs to know where it will get the correlation-ID value to correlate to the returning HEEA. That information is provided by the CorrelationAlias attribute, which instructs the workflow runtime that the AddIntegerCompleteEventArgs.Id property is an alias for the correlation-ID value.

The workflow runtime will now associate the CorrelationToken of the CEMA with the id parameter to the IAddIntegerService.Add() method call. The implementation of the adder service must then make sure to return that same correlation-ID value in the AddIntegerCompleteEventArgs.Id property when it invokes the event. This will allow the workflow runtime to associate it with the same CorrelationToken value on the HEEA, so that the workflow runtime knows which HEEA should receive the incoming event notification.

Note  Under the hood, the correlation allows the WorkflowRuntime to queue the serialized event to the correct WF HEEA Activity queue. Queuing the event allows the scheduler to schedule the HEEA for execution within the single-threaded workflow instance.

This correlation configuration is shown in Figure 5:

Figure 5. Correlation in external data exchange

The CallExternalMethodActivity (CEMA) and HandleExternalEventActivity (HEEA) have the same unique CorrelationToken, which is scoped to a specific activity—usually, the encompassing parent sequence activity. The CEMA invokes the Add() method with the id parameter set to 123. The workflow runtime records that the CorrelationToken “ABC” is associated with the correlation parameter "123". The ExternalDataExchangeService is then free to process the Add() method call as it sees fit, including delegating the work to a thread-pool thread.

As soon as the work is complete, the thread must invoke the AddIntegerComplete event of the IAddIntegerService ExternalDataExchangeService implementation. Furthermore, the AddIntegerCompleteEventArgs must contain the original workflow-instance ID of the workflow instance that invoked the Add() method, and the AddIntegerCompleteEventArgs.Id value must be set to 123. Those values allow the workflow runtime to look up the CorrelationToken value for the correct HEEA, so that it can pass the event back to that specific HEEA activity instance.

Note  Once again, the underlying implementation actually queues the event to the unique queue that is associated with the unique HEEA activity instance.

Service Configuration and Binding

The preceding sections described how CEMA and HEEA activities interact with custom local-service implementations. This section describes the steps that are required to implement and initialize a custom local service. There are four steps to using a custom local service:

  1. Defining the interface
  2. Associating the interface with the CallExternalMethod and HandleExternalEvent activities (CEMA and HEEA)
  3. Creating a concrete implementation of the interface
  4. Registering an instance of the concrete implementation with the WF WorkflowRuntime
    public interface IAddIntegerService
        [CorrelationAlias("id", "e.Id")]
        event EventHandler<AddIntegerCompleteEventArgs> AddIntegerComplete;

        void Add(string id, int a, int b);

Listing 2. Local-service interface

The first step is to define the service interface. The service interface must define a method to be called by the CEMA activity and an event to be handled by the HEEA activity. The interface must then be decorated with the [ExternalDataExchange] attribute for the WF designer to identify it as a service. This is shown in Listing 3.

The code listing also shows the use of the optional correlation attributes. Correlation assures that a CEMA call to an external service will be returned to the correct HEEA activity, in the case where more than one HEEA activity is listening to the same service event in the same workflow. The correlation attributes are used to instruct the WF event listener activities about which fields can be used as lookup keys for unique correlation tokens, which are used to route events back to the unique queue that is associated with a specific HEEA activity. In the preceding example, the [CorrelationParameter] and [CorrelationInitializer] attributes tell WF that the id parameter of the Add() method will be used as the correlation key. The [CorrelationAlias] attribute then tells WF how the id value is mapped into the event arguments that are associated with the returning AddIntegerComplete event. In this case, it says that id is mapped to the AddIntegerEventArgs.Id property.

The attributed interface can be associated with a workflow, as long as it is part of the assembly or referenced assemblies of the workflow project. The interface is associated with the CEMA activity in Microsoft Visual Studio 2005 by using the CEMA activity’s property settings, as shown in Figure 6:

Figure 6. Association of CEMA with service interface

The InterfaceType property allows the user to browse all service interfaces that are attributed with the [ExternalDataExchange] attribute. As soon as the interface is selected, the CEMA property pane also provides support for selecting the service method to call and then binding parameters to the method call, as shown in Figure 6.

The interface is associated with a HEEA activity by using the HEEA activity’s property settings, as shown in Figure 7:

Figure 7. Association of HEEA with service interface

The InterfaceType property allows the user to browse all service interfaces that are attributed with the [ExternalDataExchange] attribute. As soon as the interface is selected, the HEEA property pane also provides support for selecting the service event on which to wait and then binding event parameters to local properties.

A concrete implementation of the IAddIntegerService service interface will be required in order to perform the actual service operation (in this case, Add) and to return the result to the caller. Listing 3 shows a trivial concrete implementation that performs the operation and invokes the returned event:

public class AddIntegerService : WorkflowRuntimeService, IAddIntegerService

        public event EventHandler<AddIntegerCompleteEventArgs> AddIntegerComplete;

        public void Add(string id, int a, int b)
          int result = a + b;

            AddIntegerComplete(null, new AddIntegerCompleteEventArgs(
         WorkflowEnvironment.WorkflowInstanceId, id, a, b, result));


Listing 3. Concrete local-service implementation

The AddIntegerService in Listing 4 shows a more complete implementation that delegates the work to a separate thread-pool thread:

namespace MultithreadedParallelism
    public interface IAddIntegerService
        [CorrelationAlias("id", "e.Id")]
        event EventHandler<AddIntegerCompleteEventArgs> AddIntegerComplete;

        void Add(string id, int a, int b);

    public class AddIntegerCompleteEventArgs : ExternalDataEventArgs
        private string id;
        private int a;
        private int b;
        private int result;

        public AddIntegerCompleteEventArgs(Guid instanceId, string id, int a, int b, int result) : base(instanceId)
            this.Id = id;
            this.A = a;
            this.B = b;
            this.Result = result;
      // Get and set public properties for these member variable elided for brevity

    public class AddIntegerService : WorkflowRuntimeService, IAddIntegerService
        #region IAddIntegerService Members

        public event EventHandler<AddIntegerCompleteEventArgs> AddIntegerComplete;

        public void Add(string id, int a, int b)
            ThreadPool.QueueUserWorkItem(Add, new AddArgs(WorkflowEnvironment.WorkflowInstanceId, id, a, b));


        private void Add(object state)
            AddArgs args = state as AddArgs;
            Console.WriteLine("\tAdding {0} + {1} on thread {2} at {3}", args.A, args.B,
            AddIntegerComplete(null, new AddIntegerCompleteEventArgs(args.InstanceId, args.Id, args.A, args.B, args.A +

    internal class AddArgs
        private Guid instanceId;
        private string id;
        private int a;
        private int b;

        public AddArgs(Guid instanceId, string id, int a, int b)
            InstanceId = instanceId;
            Id = id;
            A = a;
            B = b;

      // Get and set public properties for these member variable elided for brevity

Listing 4. A simple AddInteger service

The final step in service registration is to register the concrete implementation with the WF WorkflowRuntime in your workflow program. This is done during the initialization of the WorkflowRuntime instance, which you will use to run workflows that contain CEMA/HEEA activities. An example initialization, which uses our example AddIntegerService, is shown in Listing 5:

class Program
    static void Main(string[] args)
        WorkflowRuntime workflowRuntime = WorkflowManager.WorkflowRuntime;

        ExternalDataExchangeService edes = new ExternalDataExchangeService();
        edes.AddService(new AddIntegerService());

        AutoResetEvent waitHandle = new AutoResetEvent(false);

        workflowRuntime.WorkflowCompleted +=
delegate(object sender, WorkflowCompletedEventArgs e) {waitHandle.Set();};
        workflowRuntime.WorkflowTerminated += delegate(object sender, WorkflowTerminatedEventArgs e)


        WorkflowInstance instance =
                workflowRuntime.CreateWorkflow(typeof (MultithreadedParallelism.SimpleSleepParallel));



Listing 5. Registering local service with WorkflowRuntime

The first step is to create a new instance of the ExternalDataExchangeService service and an instance of your concrete service, AddIntegerService. You then associate your concrete service implementation with the ExternalDataExchangeService by passing your concrete service to the AddService() method of the ExternalDataExchangeService. This registration allows the ExternalDataExchangeService to add the concrete implementation to a data structure that allows the concrete service to be retrieved by its interface type (in this case, IAddIntegerService) at run time. The ExternalDataExchangeService also must be associated with the WorkflowRuntime instance by passing the ExternalDataExchangeService to the AddService() method of the WorkflowRuntime. The listing shows an alternate order of initialization; either order is acceptable. You are ready to execute workflows that contain your CEMA/HEEA activities, as soon as the concrete implementation is registered with the ExternalDataExchangeService, which is, in turn, registered with the WorkflowRuntime.

CEMA and HEEA Under the Hood

Figure 8 depicts the actions that happen behind the scenes of a CEMA/HEEA invocation. The CEMA uses reflection to invoke the configured method of the local service. In our case, the local service then delegates its work to a thread from the pool. The thread then signals the event that is associated with the local service. This results in the invocation of the HEEA event handler, which had similarly used reflection to associate its invocation with a particular event that is exposed by the local service.

Figure 8. CEMA and HEEA under the hood

The pseudocode in Listing 6 and Listing 7 CNDJ6nn5us4RjIIAqgBLqQsCAAAACAAAAA4AAABfAFIAZQBmADEANgAyADcANgA5ADMANgA3AAAA AA== REF _Ref162769367 \h provides more details of the plumbing that the workflow library and runtime provide under the hood to execute the CEMA and the HEEA.

When the CEMA executes, it uses its InterfaceType property, which is set to IAddIntegerService, to obtain a reference to the associated concrete service. The concrete service is registered with the workflow runtime, so that the CEMA obtains a reference to it by using the WorkflowRuntime.GetService() method.

The CEMA then uses its MethodName property to obtain the MethodInfo information, via reflection, from the interface reference. As soon as the CEMA has the MethodInfo information, it can then invoke the method on the interface by using reflection. It passes the parameters that were configured in its property settings (in this case, id, a, and b).

Workflow runtime executes CEMA

IAddIntegerService adder = WorkflowRuntime.GetService<IAddIntegerService>() ;

MethodInfo mi = (adder.GetType()).GetMethod(MethodName);

mi.Invoke(adder, new object[] { method_parameters }) ;

Now within the scope of the AddIntegerService

ThreadPool.QueueUserWorkItem(AddTwoIntegers, new AddIntegerState(instance id, correlation id, first_int, second_int)) ;

Now returned back to the calling CEMA

return ActivityExecutionStatus.Closed ;

WorkflowRuntime resumes execution: next activity in queue is HEEA

CreateWorkflowQueueForExpectedEvent() ;
ContinueAt(ReceivedExternalDataEvent) ;
return ActivityExecutionStatus.Executing ;

Listing 6. Pseudocode for main thread CEMA invocation and HEEA setup

The AddIntegerService simply delegates all work to a separate thread-pool thread by passing the method parameters and a delegate to the ThreadPool.QueueUserWorkItem() method, which queues the work for a separate thread and returns immediately to the AddIntegerService, which returns immediately back to the CEMA.

The CEMA has completed its work, having invoked the external service, so now it returns “Closed” to the workflow runtime to report that it has completed.

In Listing 7, we follow the path of execution of the worker thread for which we previously queued work:

ThreadPool assigns the next free thread to our queued task

result_int = first_int + second_int ;

FireExternalDataEvent(new AddCompleteEventArgs(instance id, correlation id, first_int, second_int, result_int)) ;

WorkflowRuntime resumes the HEEA after queuing the event for its consumption: next activity in queue is HEEA

int result = (e as AddIntegerCompleteEventArgs).Result ;

Listing 7. Pseudocode for HEEA invocation from separate worker thread

The thread-pool thread invokes the queued delegate, which performs the actual work—in this case, adding two integers together.

As soon as the work is completed, we have to inform the HEEA, which is done by firing our “complete event,” which will get picked up by the HEEA correlation code, which uses the correlation-ID information in the event to place the event on the correct HEEA queue. The HEEA is also scheduled for execution with the workflow runtime.

The HEEA resumes execution, at which time it removes the event from its queue and stores it in its event property. The HEEA then calls its configured “Invoked” delegate, which is free to process the event that is stored in the properties. This entire process is depicted in Figure 8.

Multithreaded Parallelism, Step 2: Hook the Service to the Thread Pool

This section describes how the custom local service, which interacts with the CEMA and the HEEA, can be used to provide multithreaded parallelism by delegating its work to a separate pooled thread. An overview of the code that performs the delegation is provided, along with a detailed description of the thread interaction; finally, the performance improvements are demonstrated, using output from the application.

The CEMA/HEEA combination approximates the asynchronous programming model that .NET Framework developers are accustomed to using: delegates and IAsyncResult. In short, CallExternalMethod allows the workflow author to “construct” a call to a local service that has been exposed via a given interface. In this way, a user can bind to the method to be executed, and further bind the parameters for that method, by using the properties of the CEMA. On the HEEA side, the workflow author configures the activity to handle an event from the specified interface (again, fired by the local service). Armed with these two tools, we can achieve near-parallelism by having the CEMA call a local service, have that local service place the task that is configured in the schema onto the .NET Framework ThreadPool (through Threadpool.QueueUserWorkItem()), and immediately return control to the workflow runtime. This is shown in the multithreaded local service in Listing 8:

    public class AddIntegerService : WorkflowRuntimeService, IAddIntegerService
        public event EventHandler<AddIntegerCompleteEventArgs> AddIntegerComplete;

        public void Add(string id, int a, int b)
            ThreadPool.QueueUserWorkItem(Add, new AddArgs(WorkflowEnvironment.WorkflowInstanceId, id, a, b));

        private void Add(object state)
            AddArgs args = state as AddArgs;

            AddIntegerComplete(null, new AddIntegerCompleteEventArgs(args.InstanceId, args.Id, args.A, args.B,
                                  args.A + args.B));


Listing 8. Multithreaded local service

The code in Listing 8 is the multithreaded version of the local service that was presented in Listing 3 (in the subsection titled "Service Configuration and Binding"). Both are functional services; however, the service in Listing 8 is parallel across multiple threads. The CEMA will invoke the Add(string id, int a, int b) method of the AddIntegerService, which queues the parameters to the worker thread pool via ThreadPool.QueueUserWorkItem and then returns immediately. The queue call passes all of the necessary information in the AddArgs class, which is essentially a structure for passing multiple parameters. The worker thread then executes the Add(object state) method in parallel. As soon as it has completed, it invokes the AddIntegerComplete event to send notification back to the HEEA. If the ManualWorkflowSchedulerService is used, it must be invoked to start the corresponding workflow (discussed in more detail in the subsection titled "Side Note: Schedulers").

Notice that just using the CEMA doesn’t imply asynchronous behavior automatically: The local service could perform all of the work that it needs to on the thread that calls it (from the CEMA, as in Listing 3)—in other words, effectively performing the same work as a simple CodeActivity. It is entirely up to the author of the local service to take advantage of the asynchronous constructs that are provided in .NET Framework to achieve asynchronous behavior; otherwise, the end result will be synchronous. Upon completion of that task, we raise the event for which the HEEA is waiting, and resume operations. In this way, the single thread that is allocated to the workflow instance is never tasked with executing the contents of the parallel children, and is free to start them all in a near-parallel manner (it still has to queue up each child activity, but now those activities immediately return “Executing” or “Closed,” which frees up the thread to move on to the next queued activity).

It should be clear that the decision to place the task on the ThreadPool is made within the local service. The local service could have simply performed the operation inline (without handing the task to the ThreadPool); such behavior would tie up the single thread, and we would be back to “interleaved” behavior. However, by delegating the task to the ThreadPool, we free up the first thread to continue to work through the workflow queue: selecting the next activity in the queue, calling its Execute(), and so on.

Another way to look at this process is to do a side-by-side comparison, as shown in the following table:

First thread Second thread

1. WorkflowRuntime calls CEMA.Execute().


2. CEMA.Execute() invokes AddInteger.Add().


3. AddInteger.Add() puts task on thread pool.

1. ThreadPool assigns task to Second Thread.

4. Return to WorkflowRuntime for next activity in queue.

2. Add two integers, assign to result.

5. WorkflowRuntime calls HEEA.Execute().


6. HEEA.Execute() creates queue for event and returns Executing.


7. WorkflowRuntime sees no more events in queue and idles, returning thread to caller.

3. Thread raises event to WorkflowRuntime, correlates to instance, calls HEEA.Invoked(), and marks HEEA Closed. (See following Note.)

4. WorkflowRuntime proceeds to next Activity in queue.

Note  This operation actually queues an event for the workflow runtime scheduler—which might or might not use the same thread—so that the operation could be continued by the Second thread, the First thread, or a completely different thread, depending on the scheduler.

To see the actual behavior of the multithreading CEMA/HEEA model, let’s look at a simple local service that simulates a long-running process by sleeping for 100ms. The code for this service is included in the attached download. Notice in Figure 9 that we’re able to use more than just the one thread that is assigned to us by the workflow runtime, and that our time is now significantly better than our original time. With this approach, we’re better able to approximate multithreaded parallel behavior. At this point, we’re consuming three separate threads—the first thread that is assigned to the workflow instance by the runtime, and the second and third threads that are assigned from the .NET Framework ThreadPool.

Of course, this behavior does not come without a cost: Obviously, the more threads there are working in parallel, the more context switching that is required by the CPU. Without careful attention to the behavior of the local service, and an eye to freeing threads as often and as early as possible (instead of waiting through heavy I/O), we would quickly reach thread saturation (a situation in which the CPU spends more time context switching between threads than executing instructions) and reduce the responsiveness of the host. Threads that are blocked waiting for I/O do not contribute to thread saturation, because they are not switched to until the I/O is complete.

Figure 9. A simple Sleep service in parallel

Notice the use of the ManualWorkflowSchedulerService in Listing 8. The ManualWorkflowSchedulerService was chosen because the hosting application runs in ASP.NET; this is a common recommendation, as the ManualWorkflowSchedulerService often has advantages when running in ASP.NET. It should be noted, however, that the DefaultWorkflowSchedulerService could have also been used, and only testing in the final solution would tell which performs best. Note that the Manual scheduler uses the caller’s thread to execute a workflow. Control of that thread is returned to the user the moment the workflow terminates, aborts, completes, or suspends. While this is typically a useful scenario for simplifying the wiring up of workflow runtime events, it poses a problem where CEMA and HEEA are concerned: It is the behavior of the workflow runtime to suspend a workflow instance immediately after the HEEA is invoked. In that case, control is returned to the caller immediately. Unless the service that raises the event calls RunWorkflow() on the Manual scheduler, the workflow will never resume, and it will remain suspended indefinitely. Notice from the console output in Figure 10 that the “caller” thread is always used by the Manual scheduler. While the workflow instance starts off by using thread “1”, it is resumed at two different points on two different threads: Upon firing the first WaitCompletedEventArgs on thread “6”, where the first Thread.Sleep() occurred, the workflow is restarted by the Manual scheduler and resumes processing the activity queue on thread “6”. It immediately suspends again (following the execution of EndSleep2) and resumes for the final time on thread “7”—the same thread that just finished the Thread.Sleep(). It is on that final thread that control is returned to the original caller.

Figure 10. A simple AddInteger service in parallel

Side Note: Schedulers

There are two out-of-box schedulers that are provided for WF in .NET Framework 3.0: the DefaultWorkflowSchedulerService (“Default”), which is used by the workflow runtime to execute workflow instances on new threads (that is, not on the caller’s thread), and the ManualWorkflowSchedulerService, which is used by the workflow runtime to execute workflow instances on the same thread that the caller is on.

That is the main distinction between the two schedulers. The Default scheduler was the original scheduler; however, it was felt that the use of a second thread (while blocking the first, or caller, thread) would be an unnecessary overhead, especially when the runtime is hosted in ASP.NET. That is why the Manual scheduler was developed; by using the same thread that is processing the HTTP request, the runtime manages more efficiently the limited resources that are provided to ASP.NET.

There are two other distinctions: first, the way in which code must be written to use the Manual scheduler. After each call to WorkflowRuntime.StartWorkflow(), or any event firing (for instance, in our ExternalDataExchange service, where we fire the completed event), the developer must explicitly call the scheduler’s RunWorkflow() method.

The second distinction is how the Manual scheduler manages timer events: for instance, those that are used by the DelayActivity. Where the Default scheduler needs no special changes, the Manual scheduler must be configured to work with timers by using a special background thread to handle timer events. To deal correctly with these events, the scheduler must be configured with UseActiveTimers = true (also available as an overload to the scheduler constructor).

It should be noted that the reasons that were given previously for selecting one scheduler over the other do not always hold true: In other words, just because an application uses WF hosted in ASP.NET doesn’t mean that the default scheduler might not be a good option for it, especially if the developer intends to use asynchronous pages (or any form of asynchronous programming) to use WF. In this case, using the caller’s thread to block unnecessarily on I/O or any other long-running operation simply reduces the responsiveness of Microsoft Internet Information Services (IIS), instead of managing the limited resources more efficiently, as is the intention of the Manual scheduler.

Abstracting CEMA/HEEA Up to the Service Provider

This section describes how the complexities of the development and configuration of the CEMA, HEEA, and local service can be encapsulated in a simple component that is called the Service-Provider Wrapper, which includes WF designer support. The Service-Provider Model, which derives from the Factory pattern, is also introduced.

Let’s reexamine the entire development process that is required to create a CEMA/HEEA application that delegates useful work to a thread-pool worker thread. The preceding sections describe the many intricacies of interfacing WF with external resources—in particular, interfacing CEMA/HEEA activities with the external worker thread pool. We saw that a developer had to perform the following tasks, to interface with an external resource:

  • Define a service interface and its associated correlation parameters
  • Implement the service interface, including the multithreaded code that interacts with the thread pool
  • Bind CEMA and HEEA instances to the service with unique correlation parameters, to assure proper delivery of events
  • Register the service implementation with the WorkflowRuntime for consumption by CEMA/HEEA activities at run time

Our objective for this example is to provide an implementation of the integer adder service (as an ExternalDataExchange, serving at once as the “glue” and the physical adding service; for more information, see the downloadable code listings), and to have it perform the addition in parallel by using separate threads. While this is a trivial operation, it is a placeholder for operations that involve real I/O—for instance, retrieving records from a database or manipulating files. To simulate those long-running operations, we’re not only adding the two integers, but also sleeping for 100ms. In Figure 10, we’ve repeated the original constructs, but we’re now using a local service that adds the two numbers together. The numbers are configured in the Property pane of the CEMA shape (see Figure 11) and passed as arguments to the local service. In this particular case, parameters a and b represent two different integers to be added together: We’ve configured 10 and 2. When our service is completed, it fires the event: The payload contains the result of the addition. The handling of the returned event is configured in the Property pane of the HEEA shape (see Figure 12) by telling the HEEA the service interface event that generates the event and providing a delegate to process the resulting event. For more information about this approach, see the downloadable code listings.

Figure 11. The CEMA Property pane

Figure 12. The HEEA Property pane

What stands out from this approach is the inflexibility: If we need to add a new parameter, modify the functionality, or extend the behavior in any way, we have several layers of code through which to sift—from the workflow and activity tier, with configuration of the binding parameters for the call, to the code in the local service, the eventargs payload of the event, and the processing of the event in the HEEA. However, not only is it brittle, but there is a significant amount of work to be done in simply introducing new services, which places a heavy burden on the developer to get a complex process right. There are many pitfalls; also, the CEMA/HEEA construct is not easy to debug, as there are numerous ways to cause the HEEA to fail silently or with misleading exceptions. For example, should an item in the event arguments not be serializable, the error message indicates that there is no listener for the event (a confusing error message). It isn’t until the developer digs deeper that the developer discovers the serialization error. The CEMA/HEEA approach that we took, which is essential to implementing parallel activities in our business operations, was designed to be readily available to all, as well as easy to implement for developers of all skill levels.

The solution to our problem was to abstract the specifics of a call into a service-provider model. The addition of a factory to produce these service providers allowed us to abstract completely the development of new features and functionality for workflow into external services that are delivered through classes that only had to implement the interfaces that we created. Combined with the ease of configuring the call via properties in the pane, we created a “wrapper” around these external packages. The objective was to create a drag-and-drop container with a great deal of simplicity for both the developer of the workflow and the developer of the services that are consumed by that workflow.

As you can see, the steps that are required to create a custom CEMA/HEEA solution that leverages the thread pool require a considerable amount of custom development, which is complex, time consuming, and error-prone. Additionally, ExternaDataExchangeServices cannot be registered declaratively, which requires the runtime initialization code to be updated and redistributed every time that a new service is created. Because of these issues, a generic abstract service model is more appropriate to reduce the complexity and increase developer productivity.

Note  An alternative would be to create a custom ExternalDataExchangeService loader, which would dynamically load configured services at run time, as instructed from a configuration file.

Service-Provider Model

The abstracting of a service-provider model began by defining the means of communicating with external services. Our objective was to force rigidity into the CEMA/HEEA construct, so that nothing must be done to introduce new functionality. To compensate for that rigidity, we gave complete flexibility to the service providers themselves by using high-level objects with broad context to communicate with them. This means that there is a single interface for service-provider authors to develop against, and that same single interface for configuring the CEMA/HEEA properties. All that a workflow developer would have to do is to modify only the values of the method parameters on the CEMA (discussed later), and they could take advantage of new functionality that is provided in external libraries. We broke down this communication into a Request and Response, classes with sufficient abstraction to serve as containers for any form of method argument list and return values. The payloads of each could be bound (using activity binding) to any source necessary, but the more typical approach has been to pass-in the Request and Response as arguments to the workflow that contains the CEMA/HEEA combination, and bind to those root-level properties.

What then became clear was that we were simply constructing an adapter activity for these external services, a Service-Provider Wrapper, and that this adapter activity would have to create any number of service providers, depending on how the shape was configured in a workflow. Using the Factory pattern (Gamma, et al.), we established a pair of dependency properties for the Service-Provider Wrapper activity that could be used as the key for service-provider creation: These dependency properties must be set at design time, and they are used by the Service-Provider Wrapper activity to produce an instance of a specific service provider. That service provider implements the single Command interface:

void Execute( IRequest request, IResponse response )

Now, our CEMA and HEEA have an ExternalDataExchange interface with which to be configured and an implementation to invoke (our Service-Provider Wrapper activity); and the invocation, in turn, hands over responsibility to external services for operation. In this way, we provide a standardized, simple approach for developers to drag and drop shapes that are capable of parallel operations without worrying about the internals of workflow. They simply provide an implementation of the Service-Provider interface, configure the Service-Provider Wrapper activity, and let the workflow runtime take care of the rest.

Hiding Complexity with the Designer

While this approach worked, it was non-intuitive, there was an unacceptable amount of repetition in configuring the CEMA/HEEA combination, and there was significant room for error. The next step was clear: We needed a way to hide the specifics of the CEMA/HEEA combination, while retrieving and presenting enough information through the dependency properties to make the approach work.

Figure 13. Wrapping it all up

For this, we turned to the Activity Designer classes that are built into WF. By default, a sequential activity (the activity that encapsulates the CEMA/HEEA combination) uses a SequentialActivityDesigner, a special type of activity designer that is capable of displaying the children of a sequential activity in sequence, along with drag-and-drop, add/remove, and other capabilities. This default designer was specifically what we did not want to use: instead, we wanted to hide the two children of the sequential activity. This was adequately handled by the ActivityDesigner, a designer that was meant for non-composite activities (activities with no children). We used this designer by adding an attribute to our wrapper activity:

public partial class DSFServiceProviderWrapper : SequenceActivity

Our custom designer inherits from the ActivityDesigner, and it adds some additional features like custom validation and custom themes.

An alternative to the designer-hiding approach would have been to create a single activity in code that implements all of the functionality that was required, including that from the CEMA and the HEEA.

Now, with the designer hiding the internals, it is necessary to add all of the dependency properties that the hidden CEMA/HEEA combination needs, and wire them up correctly. After we added the appropriate dependencies for factory initialization, and Request-Response binding, we wired up the internals, so that modifications to the property pane for our wrapper would internally update the appropriate value on either the CEMA or the HEEA. (See the code in the downloadable code listings for specifics.) The end result is a simple shape with a number of dependency properties that can be set in the Property pane of Visual Studio 2005—in appearance, no different from any other non-composite shape, but capable of making a call to external code, and executing that code effectively in parallel. In other words, two such activities dropped in a parallel container would execute simultaneously on separate threads, regardless of the nature of the code to be executed.

Problems and a Solution: The Cancelling Event

Use of the CEMA and the HEEA can lead to problems in workflow-cancellation scenarios. This section describes those problems and a solution that makes the CEMA/HEEA interaction an atomic operation, with respect to workflow cancellation.

While we could now successfully execute operations in parallel by using multiple threads from the thread pool, some issues started to crop up. For instance, given two operations in a parallel shape, if one failed, the other would inexplicably throw an exception. In Figure 14, we construct a workflow with two parallel activities that will work (using our AddInteger service) and a third branch with the ThrowActivity: It will throw an ordinary System.Exception. To ensure that the throw will occur while we are waiting for our parallel activities to work, we have added a sleep to the AddInteger service. The result is an EventDeliveryFailedException, which indicates that there is no queue that is configured to receive our event.

Figure 14. Throwing an exception with Cancel

This strange error was caused by the fact that the workflow runtime will attempt to cancel “Executing” activities in the event of a failure. Should an operation fail, its unhandled exception is caught by the parent. If that parent is a parallel activity, it goes through any of its children, checks their ActivityStatus for “Executing,” and instructs the workflow runtime to cancel those activities.

In most cases, this is desirable behavior: A waiting operation should not block the workflow from terminating the current instance. However, in our situation, in which an operation on a separate thread raises the event for which our HEEA is waiting, it causes a small disaster: The workflow runtime receives the Cancel instruction, cancels the HEEA, and then terminates the workflow. At some point, the thread that executes our service provider receives back the results and attempts to fire the event. However, the workflow runtime has removed our listener and throws the unhandled exception. Clearly, we must prevent this from happening. We essentially had to assure that the CEMA call and the HEEA event reception functioned as an atomic unit.

What became clear was that we needed a way to block the Cancel operation if we had already executed our CEMA: That meant that a thread was now executing our service provider and would eventually return with results. Because none of our operations is long-running, it was acceptable that we might block for the short period of time that it took for the service provider to return results. For this, we created yet another wrapper around the CEMA/HEEA—another sequential activity called a “CancelBlockingActivity” (also hidden by the designer) that was capable of intercepting the Cancel request, and determining whether it was necessary to pass on to the children, or “block” it by simply not cancelling its children, and returning a status of “Cancelling” back to the parent. This is analogous to the new “CancelBlockingActivity” telling its parent, “Don’t bother me, I’m busy right now.” The workflow now appears in Figure 15, with the CEMA/HEEA construct surrounded by a simple activity that inherits from SequenceActivity. Notice in the console output that we’ve detected a request from the runtime to cancel the currently running operation. In both cases, that would have meant cancelling the HEEA (moving from Initialized to Canceled), which would have given us the error that was mentioned previously.

Figure 15. Blocking the Cancel to save the HEEA

The approach that is taken to blocking a Cancel is to check whether or not two conditions are present:

  • Firstly, is there a CEMA, and has it started or already completed?
  • Secondly, is there an HEEA, and is it only in the initialized state?

If both of these are true, we must not allow the Cancel instruction to go through. In this case, the "CancelBlockingActivity" tells the workflow runtime that it is “Cancelling”; but, in reality, it is allowing our Service-Provider Wrapper CEMA/HEEA pair to run to completion. The code that we used for the cancel-blocking activity is shown in Listing 9.

protected override ActivityExecutionStatus Cancel(ActivityExecutionContext executionContext)
    if (this.Activities.Count > 0)
        bool started = false;
        bool mustNotCancel = false;
        foreach(Activity activity in this.Activities)
            if(activity is CallExternalMethodActivity)
                if( activity.ExecutionStatus == ActivityExecutionStatus.Executing ||
                    activity.ExecutionStatus == ActivityExecutionStatus.Closed )
                    started = true;

            if( activity is HandleExternalEventActivity)
                if (activity.ExecutionStatus != ActivityExecutionStatus.Closed)
                    mustNotCancel = true;

        // Now check
        if( started && mustNotCancel )
            Console.WriteLine("\tReceived a Cancel: blocking while the CEMA/HEEA completes");
            return ActivityExecutionStatus.Canceling;
            Console.WriteLine("\tReceived a Cancel: no need to block");
            return base.Cancel(executionContext);
        Console.WriteLine("\tReceived a Cancel: no need to block");
        return base.Cancel(executionContext);

Listing 9. Blocking a Cancel in an activity


Windows Workflow Foundation in .NET Framework 3.0 provides interleaved parallel behavior on a single thread. For our business needs, we require multithreaded parallel behavior to support multiple synchronous calls to back-end machines. We used the ActivityBinding and Designer capabilities of WF, along with the service-provider model, to create a single elegant Service-Provider Wrapper shape that supports multithreaded parallel execution. The result extends the familiar and easy-to-use designer experience for WF, to allow us to drag and drop multithreaded parallel capabilities anywhere on the canvas. At run time, the multithreaded parallel workflows deliver the performance and efficiency that we need in high-scale e-commerce applications.

For More Information


The original motivation for this problem came from a large project within Merrill Lynch and thanks go to Suresh Nair for directing me to that effort, and providing the right guiding principles; this paper describes a part of that effort, which is collectively called Data Services Framework (DSF), for which we received the Developers’ Award from Windows in Financial Services. Without Suresh’s guidance, technical and managerial, there would be no DSF. Additionally, I would like to thank our MD Alok Kapoor whose encouragement, support and infectious enthusiasm made working on this topic fun.

A number of high-end developers/designers have poured their soul and sweat to make this effort happen: Ganesh Krishnamoorthy, John Malek, Chuck Tichnor, Ravi Okade, Chandra Busireddy and Jerry Hawk were instrumental in implementation, whereas Aruna Devineni and Vimalanth Umapathy were responsible for stretching the usage of DSF and coming up with more interesting requirements on the framework. Thanks also go to reviewers from Microsoft: Paul Andrew, Joel West & Steve Danielson.

Then, there was Joe Rubino from Microsoft. Joe Rubino can be likened to a guardian angel, when it came to providing support from the Redmond campus. All it took was a call to him to set things straight, or get help. Finally, Raman Kohli did everything possible to cajole, wheedle and coax me into sticking with timelines for writing this out, and getting it reviewed. Without Raman’s constant push, I would not have written a massive tome like this.


  1. Box, Don, and Dharma Shukla. "WinFX Workflow: Simplify Development with the Declarative Model of Windows Workflow Foundation" (section titled "Inside a Workflow"). MSDN Magazine, January 2006.
  2. Wirth, Niklaus. Algorithms + Data Structures = Programs. Englewood Cliffs, NJ: Prentice-Hall, 1976. [ISBN 0-13-022418-9]
  3. Dijkstra, Edsger Wybe. "Cooperating Sequential Processes," in Programming Languages (F. Genuys, editor). New York, NY: Academic Press, 1968.
  4. Shukla, Dharma, and Bob Schmidt. Essential Windows Workflow Foundation. Upper Saddle River, NJ: Addison-Wesley, 2007. [ISBN 0-32-139983-8]
  5. Hansen, Per Brinch (editor). The Origin of Concurrent Programming: From Semaphores to Remote Procedure Calls. New York: Springer, 2002. [ISBN 0-38-795401-5]
  6. Knuth, Donald Ervin. The Art of Computer Programming. Volume 1. Reading, MA: Addison-Wesley Publishing Company, 1968.
  7. Gamma, Erich, Richard Helm, Ralph Johnson, and John Vlissides. Design Patterns: Elements of Reusable Object-Oriented Software. Reading, MA: Addison-Wesley, 1995. [ISBN 0-20-163361-2]