Why are data-centric web services so hard anyway?

Article
12/19/2007

Let’s say I’ve got a database and even some decent technology to help me to handle persistence between that database and my business objects which create a nice abstraction over the data and enforce validation. I’ve also got clients (maybe rich apps, maybe web apps, maybe some business integration batch process) which live in a completely separate layer and have no line-of-sight to the database. Now I want to solve what seems to be a simple problem. I need to expose my data-centric business objects from a middle-tier which has access to the DB to my clients.

Seems like this should be pretty easy, but in practice it’s not. It’s actually quite painful. Actually, it’s worse than that—it is easy to get 25-50% of the way there, so I’m lulled into walking a ways down the path before I realize just how hard it’s going to be to get all the way there. Web services have been around in one form or another for what? 8 or 10 years now? Databases have been around for a heck of a lot longer than that. The Smalltalk guys were writing O/RMs 20 years ago. What’s the deal? Surely we should have figured out a good solution for these kinds of problems by now.

Well, I’d argue that these problems just aren’t as simple as they sound. We have made some progress, but there’s definitely still work to do. Further, when it comes to making this kind of thing just work out of the box, we may have some fundamental challenges—of course that’s what makes designing frameworks interesting: How can we make the simple things easy and the hard things possible while minimizing the number of things folks have to learn about and avoiding surprises along the way?

OK. Enough rambling. What I really want to do is begin a discussion about creating web services which are built on top of the Entity Framework and enable the exchange of entity data. I’ve been spending a lot of time thinking about this space lately, and I know a lot of our customers are doing the same because the questions keep coming up. The first step in the process is to identify what’s so difficult, and I contend that the difficulties come in two buckets: a) picking your compromise between some important competing requirements, and b) solving a few important tactical issues.

Competing requirements:

1) Finding balance of flexibility in the service operations you expose. At one end of this spectrum there is the desire to limit the operations you expose in order to increase security and predictability and preserve your options for the future. At the other end is the realization that the more flexible your operations are the easier it is to write apps that consume them and generally the more the operations can be. If you end up too restrictive, then your clients are tightly bound to your services and any time you want anything slightly new or different, you probably have to adjust your services. If you make things too flexible, though, then pretty soon you have to ask why you don’t just poke a hole through your firewall and let the clients access the database directly.

2) Deciding how important interoperability is to you. This was one of the original promises of web services, and in many cases interoperability is critical. If the purpose of my web service is to enable business partners to collaborate with me in an automated fashion, then quite likely I want to mandate as little as possible about the technology they use, and even if we all agree to use a common platform, I don’t really want to require that they have my business objects. On the other hand, interoperability always comes at a price, and that price is usually one of more work for the developers and less automatic services supplied by the platform. If the service is only going to be used by clients which are under my control, then I might be willing to trade some interoperability for simplifying things.

Side note: The real danger here is that this is a slippery slope—once I start throwing out interoperability some things get so much easier that I may end up entangling my clients and services so deeply that I have a terrible architectural mess on my hands. Good intentions could lead me to a very bad place. What happens when I need to upgrade the server and some clients but I can’t roll out that upgrade to every client at once? In that case, what I thought of as just cross-platform interoperability might be more about loose coupling. This doesn’t change the fact, though, that interoperability is a continuum and you need to pick your point on it.

Tactical problems:

1) Graph serialization. The problem here is that in practice WCF doesn’t currently support serialization of graphs. Trees yes, graphs no. To be honest, you can serialize graphs with WCF, but not by default, and when you dig down and turn on the option, then the whole thing is WCF-only, non-interoperable. This is why the current default story for entities is that when using XML or DataContract serialization they serialize shallowly (that is, serializing an entity will bring along all of its regular properties, but not related entities). We’ve been working with the WCF team to address this issue, and they recently came up with some great ideas for making graphs work in an interoperable way which may show up in a future release, but the practical result right now is that sending a graph of related entities over a web service doesn’t happen for free, it’s a tactical problem which must be addressed.

2) Change tracking. This one is a bit less obvious to most folks. There are a few parts to it. First, there is the matter of concurrency. When using the EF directly, original values are automatically tracked for you and updates perform automatic optimistic concurrency checks as specified in your model. Once you remote things to a disconnected client, though, you have to address the problem of sending not only the updated state of the entities when you want to save a change but the original values of at least the concurrency tokens or else the EF has no way to perform these checks. Secondly, there’s the smaller but still interesting problem of tracking which properties were actually changed. If you keep track of this information, then the updates to the database can be more efficient. Finally, if you address graph serialization, then you have the problem of tracking changes to the graph itself—this means which entities are new, which have been deleted and also changes in relationships where the set of entities may not change but which entities are related to what might change (this customer has a new salesperson—the customer & both salespeople existed before and after the operation, but a relationship change has happened).

Side note: An interesting observation around change tracking is that this is a synchronization / replication kind of problem. We usually try to simplify our view of the problem and treat it as something simpler, but in many cases this is hiding from the reality of the situation. In the worst case, operational semantics can even be introduced where the order of operations and whether or not operations are repeated are significant. When it comes to persistence to the database, the EF generally treats updates as though operational semantics aren’t an issue (usually they aren’t at the database), but when you build a web service over your business objects, the business logic may be more sensitive. For the purposes of the rest of this discussion, we’ll assume that objects are built in such a way that operational semantics are not a concern, but considering them helps to highlight the fact that change tracking is a bigger deal than you might at first expect. This seems to be one of the biggest hidden pitfalls. It seems like if graph serialization were taken care of, then I should be able to just retrieve a graph from the service, modify it on the client until I have a new graph that the looks the way I want it, and then call a simple service method that takes the new graph and persists it to the database by comparing it against the way the database currently looks. In practice, though, this is an area where things are just much harder than they seem.

So what’s the result of all this? Well, so far both the EF and LINQ to SQL have taken the approach of moving slowly into this space. The frameworks expose relatively low-level building blocks which make it possible to take anyone of a number of approaches to building data-centric web services. The next step in our survey of the problem is to take a look at some approaches to addressing the problems, but I’ll have to save that for another post. Same bat time. Same bat channel.

- Danny

Why are data-centric web services so hard anyway?

Additional resources