Autonomous Services and Enterprise Entity Aggregation
by Udi Dahan
Summary: Enterprises today depend on heterogeneous systems and applications to function. Each of these systems manages its own data and often doesn't explicitly expose it for external consumption. Many of these systems depend on the same basic concepts like customer and employee and, as a result, these entities have been defined in multiple places in slightly different ways. Entity aggregation embodies the business need to get a 360-degree view of those entities in one place. However, this business need is only one symptom of the larger issue: business/IT alignment. Service-oriented architectures (SOAs) have been hailed as the glue that would bring IT closer to business, yet the hype is already fading. We'll take a look at concrete ways that autonomous services can be used to transform the way we develop systems to more closely match business processes and solve immediate entity aggregation needs.
Technical Boundaries and Data Replication
Common Boundary Pitfalls
Business-Level Entity Aggregation Requirements
Entity Aggregation for Business Intelligence
About the Author
The term "SOA" has gained popularity over the past year and has become the buzzword de jour. Everything these days is service-oriented, SOA-enabled, or "the key to your SOA success." The industry continues to struggle with what defines a service; however, various properties of services do appear to be well accepted. Microsoft's tenets of service orientation define four such properties: services are autonomous; services have explicit boundaries; services share contract and schema, not class or type; and service compatibility is based on policy.
While these tenets act as important architectural guidelines for developing complex software, there is obviously more to be said. Many run-time aspects of services like scalability, availability, and robustness are not addressed by service orientation. It is exactly these run-time aspects that are the focus of autonomous services.
The phrase "autonomous services" seems to be a simple rewording of the first tenet, and yet the two have very different meanings. "Services are autonomous" means that teams developing cooperating services could operate independently—to a degree, of course. When taken together with the tenet about contract and schema, it is clear that there need be no binary dependencies among those teams. The development of each service could be done on a different platform, using different languages and tools. An autonomous service, on the other hand, is a service whose ability to function is not controlled or inhibited by other services.
The word autonomous has many definitions including: self-governing, self-controlling, independent, self-contained, and free from external control and constraint. In the light of these definitions of autonomy, we will examine two kinds of service interaction: synchronous and asynchronous communication.
In Figure 1, we can see that Service A needs to actively hold run-time resources (the calling thread) until Service B replies. The time it takes Service A to respond to a single request depends on its interaction with Service B. Service A is affected if the network is slow or if Service B is unavailable or slow. Therefore, it does not appear that Service A is "free from external constraint" in this case.
Another issue to consider here is coupling. While Service A and Service B may be loosely coupled in that they were developed separately by different teams on different platforms sharing only a WSDL file, we can see that they are tightly coupled in time. This temporal coupling can be seen in that the time it takes Service A to respond to a request includes Service B's processing time. It is exactly this coupling that causes undesired failures in Service B to ripple into Service A.
There are two ways that we can break this temporal coupling. One way is for Service A to poll Service B for the result. Unfortunately, polling leads to undue load on Service B (as a function of number of consumers and the polling interval), and consumers get the requested information later than the time available. Figure 2 shows the complexity of this solution.
Figure 1. Synchronous communication (Click on the picture for a larger image)
If we continue our analysis, we'll find that spawning a new thread, or even using the thread pool to handle the polling for each request we send, is going to drain Service A of its resources fairly quickly. A more performant (and even more complex) solution would involve a single thread that manages polling for all the requests sent. That thread would marshal the results back as they became available to a different thread, which would finish the processing of the original request. We would do well to heed Occam's razor before continuing down this path.
A different solution to the problem of temporal coupling that avoids the issues stated previously is to use asynchronous communication between these services (see Figure 3). In this case, Service A subscribes to events published by Service B about changes to its internal state. When these events occur, Service A stores the data that it considers relevant. Thus, when Service A receives a request it is no longer dependent on external parties for processing leaving its availability unaffected. Notice that the load on Service B is even lower than in the original synchronous communication example since it no longer receives those requests. Modern publish/subscribe and messaging infrastructure can keep the load near constant no matter how many consumers there are. Data freshness is also improved with asynchronous communication; Service A receives the data much closer to the time that Service B made it available. We need not make trade-offs on load (as a result of the polling interval) against data freshness.
Technical Boundaries and Data Replication
Although the second tenet seems clear at first glance, the nature of a boundary isn't at all obvious. Are the boundaries of Service A and Service B in the previous examples any different in the synchronous and asynchronous cases? It does not appear so, but there is one major technical difference: transactions.
Figure 2. Synchronous communication with polling (Click on the picture for a larger image)
To handle a single request properly, in cases where that request causes data to be changed, the handling of the request should be done within the context of a transaction so that the state of the service stays consistent. If that service has to interact with other services to handle the request, and as a result those services change their data also, should those changes occur within the original transaction context?
When service interaction is synchronous, the division of responsibility between services may often require that changes across service be performed within a single transaction. When services interaction is asynchronous (or synchronous with polling) we avoid changing data in other services altogether, so there is no need to have transactions cross service boundaries. Obviously, if a transaction starting in one service were to lock resources in other services, this would require high levels of trust between all involved services and blur the distinction of where one service ended and another began.
Figure 3. Asynchronous communication (Click on the picture for a larger image)
Let us define autonomous services. It is clear that autonomous services span much more than low-level communications, encompassing many aspects including trust and reliability. However, we have seen that constraining interactions between services to asynchronous messaging has guided our architecture in such a way that autonomous, loosely coupled services have crystallized. In fact, autonomous services appear to expand on service orientation (or constrain its degrees of freedom) by adding a new tenet: a service interacts with other services using asynchronous communication patterns.
Note that while a service may consume other services asynchronously, this consumption does not necessarily mean that it cannot expose a synchronous interface. Google and Amazon do exactly that. The Web services that they expose are synchronous in nature but it has no effect on their autonomy.
At first glance it appears that the use of publish/subscribe communications leads to data duplication between services similar to what happens when doing data replication. Data-replication techniques such as extract, transform, and load (ETL) or file transfer handle transferring data between low-level data sources like databases and directories. This transfer bypasses higher-level logical constructs for managing that data coherently, which often leads to duplicating those same logical constructs at the consumer end of the transfer.
Figure 4. Interactions between process, activity, and entity services (Click on the picture for a larger image)
The data that flows through a publish/subscribe interaction behaves differently. When a service publishes a message, that message must be part of the service contract; that contract is independent of the underlying data store's schema. The process of building the message—retrieving the appropriate data from the data store and transforming it to the message schema—goes through all the service layers. Services that consume these messages do not need to implement the same logic. Furthermore, the decision of when to publish the message is a logical decision for which code has been written; it is not a low-level detail of when the ETL script was scheduled to run.
The most important difference between simple data replication and autonomous service interaction is that the consumer service decides to save only the data that it needs. There is no longer a "back door" into the service database. When the consumer service receives the message that was published, the data within the message does not bypass any of the layers in the service until it reaches the database (see Resources).
When using these kinds of asynchronous service interactions, we often find that our services tend to be larger and coarser grained, often containing databases of their own and hosted on their own servers or datacenters. By keeping transaction contexts constrained to the scope of a single service, the responsibilities of that service tend to expand to the level of a business function or department—the natural boundary found in the business domain. This effect is quite understandable when we view the levels of coupling at different levels of business. Departments are loosely coupled to each other, collaborating without intimate knowledge of each other's inner workings. Groups internal to a department often require much deeper understanding into the workings of parallel groups to get the job done.
We can see that this architectural style in no way contradicts any of the four tenets, yet familiar service types found when using service orientation no longer fit.
Common Boundary Pitfalls
Process services, activity services, and entity services were once proposed as the way to do service orientation. Process services manage long-running business processes. Activity services manage atomic operations that encapsulate interaction with more than one entity service. Entity services manage interaction with a single business entity. In this model entity aggregation occurs at the entity service level. The problem with this choice of service boundaries manifests itself in the synchronous interservice flows (see Figure 4).
Although it is quite possible to model the same interaction with asynchronous messaging (specifically with respect to an external credit check service), the value in separating order processing into three services is unclear. It is unlikely that a different team would be working on each of the order services or that they would run on different platforms. The tight coupling between these services is inherent. All of them work with order processing; not sharing a common Order class would probably cause code duplication, but that duplication would go against the third tenet of service orientation. There is only one conclusion: we cannot separate our services along the process/activity/entity boundaries. Consider the result of modeling this business process using autonomous services (see Figure 5).
Figure 5. Interactions between autonomous services (Click on the picture for a larger image)
The autonomous services present in Figure 5 represent one possible division correlating to a given organization; different enterprises would have different groups responsible for different business processes. There are two important differences to note here. The first is that the sales service stores the customer and product data it needs internally, so that when it needs to process an order, it already has all the data it needs. The second difference is that the division of order handling between separate services does not occur here. Although it is likely that the sales service has different layers and components for handling the various steps of order processing, and possibly some kind of workflow engine to manage those steps, these are implementation details of the service.
The code duplication that arises by following the "schema/class" tenet in the previous case does not occur here, simply because the tenet does not apply within a service boundary. (The issue here is that they are all the same entity [Order] probably with the same fields. Before the entity service goes to save the order, it too must perform validation: code that's already been written for the activity service validation.) If we can no longer use entity services, then how and where does entity aggregation occur when using autonomous services?
Business-Level Entity Aggregation Requirements
Before we can delve into the technical details of the solution, we must better define the problem. As was first stated, entity aggregation represents the business need to get a global view of data across the enterprise. Various business departments require different data elements of a given entity, but seldom require all the data for that entity. This case is one in which one business department requires data that is owned by another department. For instance, the marketing department needs to know total order value per customer per quarter—data that is managed by the sales department. In this case, we are aggregating data from two existing systems into one of those systems, which is a different style of entity aggregation than what is most commonly viewed as entity aggregation; yet it is the most common case seen in the field.
Figure 6. Aggregating data from different systems without autonomous services (Click on the picture for a larger image)
Intersystem OLTP aggregation. We will start by examining a concrete example. In a typical order-processing scenario we have two systems: one accepts orders from our Web site; the other is a homegrown customer relationship management (CRM) system. Marketing has a new requirement that "preferred customers" should receive a 10 percent discount on all orders. A preferred customer is defined as a customer living in the United States who has done at least $50,000 of business with the company over the last quarter, but this definition is expected to change.
After analyzing this requirement we can see that it has two parts: who is a preferred customer, and what do we do with that information? The first decision that has to be made has to do with boundaries and responsibilities: which system will be responsible for preferred customers, and which system will be responsible for what we do with preferred customers? In this case there is no reason for the CRM system not to take on the first responsibility and the order-processing system taking on the second. The next decision that has to be made is how these systems will interact, synchronously or asynchronously.
Synchronous OLTP aggregation. The first way to handle this requirement is to add to the order-processing system a query method that retrieves customers whose total orders amounted at least X over a period Y. This method gives us the ability to handle various sums and time periods as the marketing department changes their rules. The CRM system would fulfill requests for which customers are preferred customers by querying the order-processing system (see Figure 6).
Autonomous OLTP aggregation. In the second approach, we view each of the systems as autonomous services that notify external consumers of meaningful events. In this case, interactions between services are more event-driven. In Figure 7, when the order-processing system needs to know if a given customer is a preferred customer, it does not have to communicate with the CRM system. The same goes for the order data and the CRM system. The data from these two systems have been aggregated by design.
Figure 7. Aggregating data from different systems with autonomous services (Click on the picture for a larger image)
Let us explore this example a bit further as the marketing department changes the rules that define preferred customers. Preferred customers are now those customers living in all of North America who have placed at least three orders in the last quarter, each totaling $15,000 or more.
In our first approach, the original query we added is no longer relevant. We now need to support a different kind of query (get customers with at least X orders over period Y, with each order totaling Z or more, where X = 3, Y = last quarter, and Z = $15,000), changing the order-processing system's interface, some of its implementation (to support the new interface), and the code that activates it in the CRM system (see Figure 8). In the second approach, the only changes we need to make are internal to the CRM system. Neither the interface nor the implementation of the order-processing system need to be modified (see Figure 9).
Figure 8. Changes made when not using autonomous services (Click on the picture for a larger image)
The term entity aggregation brings to mind an active process of collecting data from disparate sources and merging the data together to form a cohesive whole. While the dynamics of the first approach map very well to this process, the second approach does not. However, it is clear that the second approach maintains a lower level of coupling between these two services, even as requirements change. Keep this point in mind when designing services; loose coupling between services needs to occur from both the data and the logic perspective.
Entity Aggregation for Business Intelligence
In the previous case, the aggregated data was critical to the functioning of the business department(s) involved, participating in day-to-day transaction processing. However, entity aggregation is most often discussed in the management context; business wants to get a 360-degree view of the enterprise data. This context is quite different from the previous one in that the aggregated data is mainly used for decision support and business intelligence—in other words, primarily read-only usage scenarios.
Figure 9. Changes made when using autonomous services (Click on the picture for a larger image)
A simple and effective way of modeling this context is by instituting a management service (see Figure 10). This service does not perform operations and provisioning management for other services, but rather pulls together the data published by all other services and stores it in a format optimized for its usage scenarios.
Another common business requirement that comes up in the context of entity aggregation is historical analysis. While it may not make sense for the marketing service to maintain historical data about past products no longer being offered, users of the management service may find if invaluable to compare sales and profit figures of current and past products. It would be the responsibility of the management service to manage these historical trends.
One benefit of using a management service as opposed to entity services is exactly in the operations management area. Managing a single service that handles enterprise-wide entity aggregation needs is much more cost-effective than managing multiple entity services—one for each entity that needs to be aggregated. Internal changes to any of these services that do not affect their contract may not even impact the management service. Changes to the contract that may affect numerous entities result in corresponding changes only in the management service, not in each one of the entity services that previously aggregated those entities.
Figure 10. Changes made when using autonomous services (Click on the picture for a larger image)
Services may seem to be the new unit of reuse in the enterprise, but this does not comply with the tenet of autonomy, not to mention that the constraints on how we interact with a service make it distasteful. For instance, while it may seem that the management service could easily provide audit trail tracking and storage for all other services, this choice could break those services' autonomy. Should the management service go down for any reason, other services shouldn't be allowed to continue processing without auditing. Furthermore, an autonomous service could not trust another service to provide this core capability, which is not to say that you cannot encapsulate audit trail tracking (or other specific functionalities) into a component that all services reuse. Regulatory compliance issues must be taken care of within each service.
By recognizing that the requirements for OLTP and OLAP entity aggregation are different, we have been able to identify two separate, yet simple solutions using a single communication paradigm. Asynchronous messaging patterns enable the creation of autonomous, loosely coupled services that more closely resemble the business processes they model. As a result, often all that is needed to respond to changing business requirements is a local change to a single service. These small-scale changes do not affect interservice contracts and can be performed with greater certainty that other systems will not be affected and therefore in less time. Aligning IT with business has much more to do with interpersonal communications and understanding than technology, but that does not mean that technology cannot help.
About the Author
Udi Dahan is a Microsoft solutions architect MVP, a recognized .NET development expert, and the chief IT architect and C4ISR product-line manager at KorenTec. Udi is known as a primary authority on SOAs in Israel, and consults on the architecture and design of large-scale, mission-critical systems developed all over the country. His experience spans technologies related to command and control systems, real-time applications, and high-availability Internet services. For more information, please visit http://www.UdiDahan.com.
Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions. Gregor Hohpe and Bobby Woolf (Addison-Wesley Professional, 2003)
"Data on the Outside vs. Data on the Inside," Pat Helland (Microsoft Corporation)
"Dealing with Concurrency: Designing Interaction Between Services and Their Agents," Maarten Mullender (Microsoft Corporation, 2004)
"SOA Challenges: Entity Aggregation," Ramkumar Kothandaraman (Microsoft Corporation, 2004)
"MSDN Web cast: Why You Can't Do SOA Without Messaging (Level 300)," Udi Dahan (Microsoft Corporation, 2006)
MSDN: Channel 9 Forums
"ARCast: Autonomous Services," Udi Dahan (Microsoft Corporation, 2006)
"ARCast: Service Orientation and Workflow," Udi Dahan (Microsoft Corporation, 2006)
This article was published in the Architecture Journal, a print and online publication produced by Microsoft. For more articles from this publication, please visit the Architecture Journal website.