Volume 31 Number 7
Leverage CQRS to Create Highly Responsive Systems
By Peter Vogel | July 2016
The Command Query Responsibility Separation (CQRS) pattern has grown in popularity over the last three to four years. Certainly, it’s an essential tool in collaborative scenarios, where there’s a set of data updated by multiple processes (Dino Esposito makes a case for using CQRS even more broadly in his June 2015 Cutting Edge column, “CQRS for the Common Application,” at bit.ly/1OtQba3). I’d go further and claim that, in fact, CQRS is the default design pattern for ASP.NET MVC developers who query data to display in their views and then issue commands to update tables when that data is posted back to their MVC controllers.
However, CQRS is a tactic that should be applied as part of a larger strategy. The first step in that strategy is Domain-Driven Design (DDD), which is described in Julie Lerman’s June 2013 column, “Shrink EF Models with DDD Bounded Contexts,” at bit.ly/1TfF7dk. DDD leads to breaking your application into cooperating domains, each of which may even have its own database in addition, of course, to its own dedicated business model. DDD provides strategies and tactics that let domains be developed independently of each other while still working together.
Defining Inventory Domains
But DDD only potentially eliminates the need for collaboration. Consider, for example, an online sales application. Here, there is a critical shared piece of data: inventory levels. To ensure that the business doesn’t try to sell something it doesn’t have, the business can either keep accurate inventory counts of Quantity on Hand (QoH) for each Stock Keeping Unit (SKU) … or always have extra, “just-in-case” inventory on hand. In today’s lean world that second option isn’t considered: Companies don’t want to keep more inventory than they have to.
Updating QoH based on business transactions is more complicated than you might think because a real-world inventory system handles a wide variety of transactions. The obvious transactions are, of course, decrementing QoH when a SKU is sold and incrementing QoH when new SKUs are received. In addition, a company will periodically do an “inventory count” to determine the actual QoH for each SKU. Because even in a well-managed system inventory accuracy isn’t 100 percent, that count will require a change to the inventory levels. In addition, sometimes SKUs are discovered to be defective in some way and removed from inventory. Sometimes, after a SKU is sold and removed from inventory, the customer cancels the order and the SKU is returned to the shelf.
Companies want to track all of these different transactions, keeping the information specific to each transaction. When new SKUs are received, for example, companies want to know what invoice was used to buy the SKUs; when SKUs are discovered to be defective, companies want to know why; and, during inventory taking, companies want to know how big the discrepancy was. Accounting needs this information to accurately report the “state of the company”; the operations department needs this information to accurately plan for the future. Because of the need for this additional information, these transactions can’t be treated as just additions and removals from inventory.
However, not all of these transactions belong in the same domain. These transactions are divided up among domains called “sales,” “accounting,” “operations,” “receiving” and so on. Dividing transactions among domains reflects the reality that different domains have different demands.
Most of the domains, for example, don’t need up-to-the-minute data—if they were running even a business day behind the actual transactions, it wouldn’t be a problem. The accounting department, for example, only needs to know the financial state of the inventory at month’s end and, even then, might not expect to have that information until the first few days of the following month. While it might be possible to keep inventory data more current, it’d be hard to find a business justification for doing that. Those departments’ inventory information can be “eventually consistent.”
The sales system can’t have “eventually consistent” inventory information, though. The sales department needs to know what the QoH is right now so that it can decide whether a SKU can be shown to the customer (“Only two left! Order now!”). In fact, while most domains would have a single number for the QoH for a SKU, the sales system might keep the QoH as two numbers. One number is the “reserved” quantity (SKUs requested by a user who’s in the process of creating an order) and the second number is the “still available for sale.” If a customer buys two items, the reserved number is increased by two and the available for-sale number reduced by two; at the end of the sale either the reserved quantity is reduced by two or, if the user cancels the order, added back to the available for-sale number.
Both accounting and operations will need the flexibility of a relational database to join together tables in a variety of ways. They also will need the ability to search that data, sometimes in ways that hadn’t been considered prior to the discovery of a particular problem. Given the amount of data involved and the need to research the history of transactions, paging will also be required.
The sales system doesn’t require as much flexibility. The relationships between entities are fixed with the design of the UI, as are the search requirements (though paging support is still required).
Response time demands also vary among domains. For most departments, a response time measured in seconds wouldn’t harm the company; for the sales system, response time must be measured in fractions of a second.
Building a single system to meet all of these needs would be difficult (I’d say impossible). Building an application for each domain is, at least, possible. For example, the product management department would have a product list that’s constantly being updated both with new products and information about existing products; the sales domain might, on the other hand, keep a read-only/query-only product list that’s regularly synchronized with the data in the product management domain.
Think of domains as the single responsibility principle applied at the enterprise level. Each domain handles one part of the business well. While the enterprise is complicated, each domain can be—relatively speaking—simple.
The CQRS Solution
All of these domains are still sharing the inventory levels, however. As transactions pass through domains such as accounting and receiving, they must notify the sales system of the changes to inventory levels. Even within the sales system, multiple customers might be attempting to purchase the same SKUs, individually driving stock levels up and down, and requiring some level of locking as those numbers are adjusted.
The CQRS pattern becomes useful here by going beyond what the typical ASP.NET MVC developer would consider. Within most domains, for example, the applications can query their own databases, which contain the information that the domain needs. Once it comes time to issue a command to adjust inventory levels, all domains must update the online sales domain’s data. And the obligation goes both ways: As items are sold, the sales system must notify accounting, operations and other domains about changes in QoH due to sales for each SKU.
Rather than update another domain’s data, however, each domain is only obliged to notify other domains about something in which those other domains are interested (in this case, QoH). Each domain must be responsible for updating its own data because each domain knows how to manage its data and no other domain does.
The operations domain, for example, is constantly exploring the relationships among its data to predict inventory demands and to determine what’s driving stock-level fluctuations. That domain must support the flexibility in querying data that a traditional relational database provides. The complexity in the operations domain is driven by the kind of analysis required in that domain.
The sales domain, on the other hand, needs something simpler. It needs to know what the QoH (reserved and available for sale) is for any SKU. It might even make sense for the sales system to just keep the ID for every SKU and its two QoH numbers constantly in memory. If that isn’t possible because of the number of inventory items, it might still make sense to keep in memory the 20 percent of the inventory that drives 80 percent of the company’s sales activity. The other inventory items could be held in some NoSQL database that’s designed to support sales transactions without needing to provide the flexibility that, for example, the operations domain requires. The complexity in the sales domain is driven by the need for low response times.
These differences mean that the operations domain can’t be expected to know how to update QoH numbers in the sales domain (and vice versa, of course).
Domains, therefore, might well be querying one database (their own) while sending commands to another database (everyone sends QoH updates to the sales domain, for example). While DDD provides a strategy for segmenting domains with different business requirements, CQRS provides one of the tactics for managing updates among those domains (for a more in-depth discussion of the query side of CQRS, see Esposito’s March 2016 column, “The Query Stack of a CQRS Architecture,” at bit.ly/1WzjvPi).
Handling Commands and Events
Of course, you don’t want to make the applications in these domains more complicated by having to deal with the diversity of domains that must be notified for each transaction. Rather than keep track of all of the domains that must be updated, each application will send transactions to a utility that’s responsible for notifying the various domains (typically called a “command bus”). As new domains are defined (or existing domains change their demands), only the command bus within the domain that originates the transaction needs to be updated to reflect the new notifications that are required.
These transactions can be divided into categories: commands and events. The distinction between the two is more conceptual than technical. Effectively, both commands and events are messages that wrap up the key information about a transaction. For our inventory transactions that would be the Id for the SKU, the net change to the inventory level, and the additional data required by the transaction (when goods are received that additional data might be the vendor number and the invoice number; during stock taking the additional data might be the Id of the employee who actually counted the SKUs). These messages could be encoded as POCO objects or as XML/JSON documents (or both, depending on how the data is sent between domains).
For me, the definition of a command is that it’s something directed to a single receiver in order to carry out a task. A command is usually a task that needs to be performed immediately and, obviously, is sent before the task is performed. A command can also be expected to return a success/failure response that the application can use to inform the user whether everything has worked (and, potentially, cause the application to perform a query to retrieve the data that shows what results were achieved). Most updates within the domain that originated the transaction are probably handled with commands.
Events, on the other hand, occur after the task is performed, can be processed by multiple receivers and, usually, aren’t required to be processed immediately. Events aren’t expected to return a result, at least not immediately. If something goes wrong with an event, the application will typically find out about it through some deferred return message (“We’re sorry, it turns out we can’t process your order because your credit card was declined”). Most, but not all, updates outside of the domain that originated the transaction are probably handled with events.
And, like most conceptual distinctions, this is probably a continuum; some messages are “obviously” commands, some messages are “obviously” events and there are some about which reasonable people could disagree.
A single transaction in one domain might generate some combination of commands and events. Consider new SKUs showing up at the receiving dock. Once the SKUs are properly received, the bus for that domain would send a command to the sales system to have the QoH for that SKU increased immediately; the bus would also post an event so that the accounting and operations systems can be notified that “something has happened” and should be taken into account at month’s end. Looking at the messages involved, it might be difficult to determine which one is the event and which one is the command—except, perhaps, by looking at the name of the message; events tend to have names in the past tense (GoodsReceived) while commands tend to have imperative names (IncreaseInventory).
The bus might send the command to the sales system by calling a RESTful service in that domain for immediate execution; the event might be written to some message queue to be processed by other domains at their convenience (I’ve discussed some of the options in an article I wrote for VisualStudioMagazine.com, “Simplifying Applications by Implementing Eventual Consistency with Domain Events,” at bit.ly/1qn1wwV).
Of course, even with the command sent to the Web service, who knows what happens behind that Web service? In order to handle large numbers of simultaneous requests, the Web service for the domain might just write the command message to a queue and return a “Thanks, got it” response, keeping response time short and improving scalability. In addition to improving scalability, writing commands to a queue lets the domain recover from what might otherwise be catastrophic problems. If the database or network is down, for example, the sales system can wait patiently for service to be restored and then process any commands sitting on its queue. So, even commands can end up on queues.
As I said, the distinction between commands and events is conceptual, not technical.
Processing Commands and Events
Thanks to CQRS, applications can now be working with some combination of two databases: one for queries (probably local to the domain) and other databases that are targets for commands and events. For example, the sales system will be working with a data store that includes the NoSQL database that holds QoH data; the operations and accounting applications might be working with a data store organized around the history of the events/commands.
The difference between the two systems is that the sales system needs a snapshot of the current state of the inventory levels to meet response time demands; the operations and accounting domains need a history of what happened to each SKU to support analysis. The operations and accounting domains can work by using a different tactic: event sourcing. With event sourcing, domain logic rolls through the audit log of the events they’ve been notified about to provide a final answer (for the accounting system that might be, “Based on the history of posted transactions, the current value of your inventory is X dollars”).
There are advantages and disadvantages to event sourcing. With event sourcing it’s always possible to recreate a snapshot of the current state of the data by reprocessing the list of transactions; accounting appreciates that feature as it adds adjustments to the list of events. With event sourcing, it’s also possible to describe the future by processing potential events (expected deliveries and sales); operations appreciate that feature when planning.
As the list of events increases, so does response time, however. Accounting can roll forward from its “last known good state” (probably the numbers from the last month’s-end closing) and generate a snapshot that represents the month’s-end numbers. That snapshot is saved as the current “last known good state” and published as month’s-end reports. Operations can roll forward from today to some indeterminate point in the future and, probably, never generate a snapshot; they’d recreate the future each time it was requested. Given the response time expectations for these domains, those are probably reasonable scenarios.
To determine the current QoH for the sales system using event sourcing, however, the sales system would have to roll forward through all transactions since the last inventory count. Because inventory counts are labor intensive, those counts don’t occur very often. As a result, processing all of those events since the last count would create unacceptable response times for the sales system. Instead, the sales system keeps its constantly updated QoH numbers in memory.
While querying requires various levels of support in database (and, as a result, a variety of indexes and foreign/primary keys), updates do not. Virtually all updates are driven by the Ids of the entities involved. The list of inventory SKU with the QoH numbers, for example, is driven entirely by the SKU Id. This can dramatically simplify the data model for the command side of a CQRS system. The ability for Entity Framework to generate a collection of SalesOrderItems for a SalesOrder is irrelevant if the command/event message simply includes the Ids for all of the SalesOrderItems that were changed in a transaction.
The impact on locking in the database as a result of this design is interesting. Updates to the QoH and reserved quantities in the sales system consist of changing one or both of the integer values; locking should be minimal. Locking in some of the other systems can disappear if those systems are event sourced; the transaction always inserts some transaction into an event table so there are no updates.
Effectively, then, the business has multiple, independent processors updating data within their domains, processing commands and events. Without locking, it’s possible that this can create conflicts. For example, a command to purchase two items might appear at the same time as an event that reduces the QoH to zero (someone did an inventory count and noticed that there was nothing on the shelf). Interestingly, a queue-based, event-sourcing approach might resolve this problem; the QoH update processor in the sales system could work on an event-sourcing basis, rolling through all the recently received commands in a queue (within some limit) and updating QoH with the total of their results. Commands that show up simultaneously would be summarized into a single update. Alternatively, it might simply be necessary to recognize that, on occasion, the business is allowed to cancel an order just like a user can.
CQRS is a powerful tool. It has the biggest payoff, however, when applied to shared data stores, collaborative processes and within the strategy provided by DDD.
Peter Vogel is a system architect and principal in PH&V Information Services. PH&V provides full-stack consulting from UX design through object modeling to database design. You can contact him at email@example.com.
Thanks to the following Microsoft technical experts for reviewing this article: Dino Esposito and Julie Lerman
Dino Esposito is the author of “Microsoft .NET: Architecting Applications for the Enterprise” (Microsoft Press, 2014) and “Modern Web Applications with ASP.NET” (Microsoft Press, 2016). A technical evangelist for the .NET and Android platforms at JetBrains, and frequent speaker at industry events worldwide, Esposito shares his vision of software at firstname.lastname@example.org and on Twitter: @despos.
Julie Lerman is a Microsoft MVP, .NET mentor and consultant who lives in the hills of Vermont. You can find her presenting on data access and other .NET topics at user groups and conferences around the world. She blogs at thedatafarm.com/blog and is the author of “Programming Entity Framework,” as well as a Code First and a DbContext edition, all from O’Reilly Media. Follow her on Twitter: @julielerman and see her Pluralsight courses at juliel.me/PS-Videos.