CRUD, only when you can afford it (Revisited)
So, I did a little more work on it and I hope I addressed the feedback in it. I don't think I'll be providing code anywhere soon though. I hope that the project that David Hill and I are doing together with msdn “Patterns and Practices”, we'll get to good examples for offlining and offline UI. The article doesn't really have a beginning, a middle and an end yet. A few more internal passes and I hope I can publish this officially.
After my article on Dealing with Concurrency: Designing Interaction Between Services and Their Agents (http://msdn.microsoft.com/library/en-us/dnbda/html/concurev4M.asp) and my talks at the TechEd in Amsterdam, I needed to express my thoughts on CRUD.
I concluded that article with the following recommendations:
Confine pessimistic resource usage
Make information predictable
Manage the time of validity
Personalize by assigning ownership
Limit optimistic updates
- E.g., Add rows instead of update fields
Design business actions
State your purpose
State your preconditions
Use the Journal pattern
When I wrote “Limit optimistic updates”, one of the things I meant to imply is to limit the use of CRUD. Don’t use CRUD if you don’t have to. In this article I try to point out in which cases you do have to use it because, I believe, that in those cases you should. Using replication is a great way to go if your problem allows it.
Why don’t I like CRUD?
Hearing people talking about CRUD makes me cringe; I feel the urge to react. So, let me get that off my chest first.
CRUD, Create, Read, Update and Delete is about working with entities and changing those entities by retrieving the data from a service, modifying the data by setting properties and then sending the data back to the service for update.
The reason for my allergic reaction may stem from the way I see developers use Datasets the wrong way. Such a developer will write a service that gets you a Dataset, the UI modifies the Dataset and sends it back. The service or even the stored-procedure checks the before and after image, the timestamp or the version … ouch!
Why not have a request specifically stating what the request entails and specifies when the update should be executed or refused? E.g. buy this when the price is at most x, rather than buy this when the version of the information retrieved in an earlier request was 30256917 or worse: the old address was …, the new address is … Did the customer move? Then we better start a complete process. Or did we just correct a typo in the contact information?
CRUD is data oriented. I don’t ask for a specific business action, I just ask the service to create an entity, or delete an entity, or I ask it to update the entity by giving the service the new data for that entity. The service derives from the data changes what action needs to be executed if more needs to be done.
Most order processing is not CRUD, or at least I don’t think it is. An order can be created offline and then sent (replicated if you will) to a service for processing. Processing of that order will affect many of the related entities. The customer information will be updated and more than only the year to date totals may change. The properties used for price and discount calculations may be updated, because the customer reached the critical order mass and is upgraded. Products may or may not be available; delivery dates may or may not have been realistic etc. CRUD to me means that the entities are read, created, updated and deleted. Submitting an order to an order system, regardless of the physical implementation has the semantics of requesting the service to accept and process a new order. Such an order is a complex request for a complex business transaction and I regard such a request as a “Journal”.
If you do not use a CRUD interface, but offer business actions such as upgrade the customer or set the new quantity for inventory, the service will execute the request and then, deep inside the bowels of that service, use CRUD on the database. If you offer a CRUD service interface, the service will in most cases, take the property changes and map them to logical requests that are executed and, again, deep down in the service use CRUD to do the database updates.
Problems with CRUD at the service interface level are that CRUD
Looses the context of the change (e.g. the contact info changed because the contact has another function in the organization or the contact info changed because the department was renamed)
Implies dependencies you may not want (e.g. the sales system refused the order because the price is not the same anymore, even though it has been reduced)
Omits dependencies you may need (e.g. only place the order if the time to deliver is less than 10 work days)
Yet a lot of solutions have been built using the principle of CRUD and at least some of them were very successful. CRUD has been used successfully for entity aggregation as well as for offline scenarios. Why did they choose to use CRUD?
Replication is the main answer; ease of interface design and implementation may be another. People want to use replication as offered by email systems and databases. Even though I will not go into the various ways that data replication can be configured in databases, I will look into how replication can be used.
Replication is about keeping two or more systems in sync while changes may occur on each of these. As we are all aware, in computer science there are only three numbers: zero, one and more. Hence, when I write about replication between two systems, feel free to extrapolate this to more.
When two systems, each with its own data store and each with its own business logic need to synchronize their information (merge replication or two way replication) I can either
- replicate at the data level, or I can
- replicate at the service level.
When replicating at the data level, the business logic in the services on either side have no easy way of protecting the consistency of the data; the data will just be replicated and the business logic will only execute when a conflict is detected, if at all. This also implies a high degree of trust and thus coupling between the systems; the data in the system is changed without much scrutiny.
Data level replication has the advantage that I can use standard mechanisms such as database replication. This immediately leads to a couple of other advantages. These mechanisms are typically fast, they sometimes support synchronization through firewalls and, the design and coding effort is greatly reduced.
The alternative is to provide some additional logic to walk through the changed data on both sides and use the service’s published interface to replicate the changes. Normally this is more development work and it may carry a performance penalty. But, when you choose for the latter, you are not constrained to using CRUD; you may use any mechanism you want.
As a side note, when designing for replication, it is wise to assign ownership of the data. We often talk about single master. The essence is that for each piece of data a single owner exists, so that conflict resolution is made easier. Note, that a single owner can be a system or a real person; important is that there is only one.
Even though service to service replication needn’t go through CRUD, often entity services will offer CRUD semantics and replication mechanisms are often easier to build using those semantics. When a request comes in, the before image of the entity is retrieved, the changes are being applied, the after image is retrieved and both before and after image are stored in a change log for later use during the replication. This looks like reusable code doesn’t it?
I can make this even more complex by adding a data store in the middle and using data replication between. This would be useful in a scenario in which I use Outlook as the front-end to a CRM system for instance. Outlook and Exchange would replicate the data using the available mechanisms and Exchange and the CRM service would replicate the entities through the business logic of the CRM system. I will get back to this scenario later.
Such solutions seem very reasonable. So why can’t we use CRUD or can we? Let’s compare the CRUD based service interfaces with a few other cases.
In object oriented design a class typically has a set of properties. These properties are accessors (get / set) to the private data. Properties are typically used to keep the internal state consistent; they are not used to trigger complex business logic. The set methods are typically held small and they typically have none or little effect on other classes and instances. If they were to change state in other classes as well, changing of a property would lead to a cascading effect of property changes and the system would become unmanageable. Properties are used by various levels in the design. It is important to note however, that not all of the set methods on these properties are declared as public. Only those properties that are without side effect when changed are typically made public. The other ones will be private if they were designed as properties in the first place.
For instance the Customer class in an insurance application may have an address property, but changing the address may result in an extremely complicated business process. Thus this property will typically be private, so the change can only be done after all the side effects have been dealt with. For an address change, the designer would typically offer a method, which internally will use the private property.
Similarly CRUD implies change of properties (Luckily it allows us to send all properties at once; otherwise performance alone would make CRUD unworkable). Property changes that have side effects, i.e. effects that transcend keeping internal state consistent should not be supported (be prohibited). For these, requests should be defined that convey the business meaning of those requests as well as the conditions under which they should be executed.
In working with databases CRUD is the normal way of dealing with data. But there are a few notable differences between talking to a database and talking to a service.
Data in the database is typically normalized. Properties aren’t replicated; a single update typically suffices to maintain consistency. But, I don’t think it’s a major difference if we look at a single service. Good entity design should help and a service could (and should) provide the logic to keep the data consistent. It becomes different if the same information is held in multiple services; then logic external to these services has to keep the services in sync.
Database updates are transactional. This means that multiple updates that are, at least from the database’s view, unrelated can be combined in a single all or nothing, or ACID, transaction. When one of them fails, all fail. In contrast, multiple updates to the service are not combined; logic external to the service has to maintain consistency. If the service only provides a CRUD interface, a business action that updates multiple entities must be split into multiple updates, each on a single entity. The service will not offer transactional consistency between them, simply because it cannot. The caller has that responsibility. But, in replication scenarios, the caller does not take that responsibility either.
Updates to the database are not protected by business logic. That protecting business logic does the database updates and doesn’t require additional logic to be executed, much like the properties of a class in object oriented design.
I think all three points lead to the same conclusion. Services are not databases, but by only providing CRUD interfaces, they offer a poorer form of database like behavior. If services should offer business actions and business logic, they should be given the information needed to execute such business actions and ensure that business logic. The service encapsulates entities and it protects consistency through business logic. If the original business request affects multiple entities, a service can only provide complete service if enough information about the original request is given to the service. Then it can execute business logic, apply that business logic even across multiple entities and maintain both information consistency and transactional consistency.
Now, this may all be true, but is it important? As is always the case, sometimes it is and sometimes it isn’t. Architecture and design is all about making the trade offs. So, when is it important to expose business actions instead of CRUD?
If it is important to maintain consistency or if there is a reasonable chance of concurrency issues and if it is not easy to resolve those concurrency issues try to use business actions and avoid CRUD. Booking a travel where it is important to have a flight, hotel and car or booking an order where both products and service are required at a specific time are examples of requests that require consistency. The simplest examples of concurrency issues are the general ledger and the stock on hand information; they are hardly ever, if at all, exposed through CRUD. Not because of dependencies, but because of the concurrent nature of updates.
If, on the other hand, updates are seldom or are only done by a single person, or, if the updates consist of adding information rather than changing existing information there is not much of a problem. The contact information in a CRM system will not be changed often. The appointments in my calendar are typically only changed by me. Even if our admin has complete change rights to my calendar, the chance of both of us changing the same appointment is slim, because of the common business practice that she just does not change my calendar unless she has to. In many cases, concurrency issues are guarded better by business practice or habit, policies and process than by the software. The software should support these and possibly enforce these, but without them, the software enforcement makes little sense.
To summarize this:
Only use CRUD when
concurrent updates can be avoided because
updates are seldom, or
updates have only one source (person or system)
Design changes as creation of entities rather than updates
Granularity or chattiness is neither an argument for nor against either approach. When the service exposes methods that accept complex requests and even combinations of requests (as in the Journal pattern in Dealing with Concurrency), requests may even be less chatty than when using CRUD. On the other hand, the synchronization mechanisms are often well tuned. And both approaches are obviously infinitely better than sending individual set-property requests.
At Microsoft, the people in the field use only a subset of what the CRM system has to offer for most of their work. They keep their contacts up to date. They add opportunities and activities and keep these up to date as well and they read the customer information. What if we were to provide this to them via Outlook? Outlook clearly only supports replication, so that all changes made to local data would be replicated to the server and into the CRM service. Every context other than the new entities and the changes to the existing entities would be lost. Still, I believe it would be a good, even recommended, solution.
Customer information is provided in Outlook but cannot be changed through Outlook.
Creating and updating of contacts is not critical. Contact information is shared, but when the contact information is changed, it is a minor change without much business logic. When an update fails the user may receive a synchronization error and deal with it is he or she deems appropriate. The chance of such a failure would typically be small.
Creation of an opportunity triggers a business process or workflow. But the creation of the opportunity in Outlook provides enough information to that process and this creation does not depend on any other changes. Even if these other changes are related, the opportunity does not have a “transactional” dependency.
Adding activities is uncritical. Changing activities might lead to conflicts, but much like appointments, activities can be though of as personal information. i.e. the owner of the information is known and that ownership is respected by common practice. The activity may be created by one person and then owned by another, but that is normally arranged outside of the software.
More complex cases don’t have to be offered through Outlook. The original solution provides these cases. Most users, and especially occasional users, will have sufficient functionality through Outlook and can hook up to the corporate network in the seldom case they need more; more demanding users can still use the original solution and have more overhead, but also more functionality.
Best of both worlds
Until here I’ve discussed CRUD as an all or nothing approach. It doesn’t have to be that way. That would only be the case if you were to rely entirely on standard replication mechanisms. A good way of combining approaches is to replicate information from the service and to queue requests to the service. (You can optionally change the local data to reflect those requests, but if those changes are ignored and overwritten after the requests have been processed by the service these local changes don’t affect the solution.) The business actions in the queue will then be processed and these actions can follow all recommendations made in “Dealing with Concurrency”. Instead of queuing the requests in a queuing system, I can store the requests in a database table and have them replicated. Another way of looking at this is that I may use replication as a messaging mechanism. Processing the replicated requests is exactly the same as processing requests that have been sent to the service using mechanisms such as SOAP.
A variation on this is to model the requests as entities. The order is a good example of this. The order is essentially a request to the service. Modeling the order as an entity has the additional advantage that it has become part of the business system and that it now can be tracked and audited. Another example would be the address change of the customer. Instead of designing a request for address change, I could design an address change form that has all the required information. The form would state the reason, such as correcting incorrect information, a new telephone number or moving to a new location; it would have the desired date of change etc. Now, if the customer calls me six months later, I have the original request and the activities that I executed that relate to that request. The address change form is now an entity in my system. I can use many of the standard replication mechanisms to replicate orders and address change forms. They are new instances in my system or database and they do not introduce concurrency issues. Note however, that neither the local order entry nor the local address changes need to have any effect on the other local data. Processing the request (or new entity) by the service may affect the other entities and those changes will then be replicated back to the local store.
Suppose you want to write a solution for a service technician. The technician needs to fill out hours on the road, hours spent servicing, diagnosing and repairing. He or she needs to specify the material used, specify the condition of the devices, etc. The information about the customers, the devices, the maintenance contract per device, the prices for the materials, all can be replicated to the notebook. However the per customer service information that is entered consists of more than just an invoice; it can be a complex set of requests to the service that may contain a few problems on synchronization that need to be resolved before the complete set is accepted.