Ramblings about POCO*, transparency and delayed load.

As someone pointed out, I have been referring to ObjectSpaces as supporting true POCO – which is technically incorrect. ObjectSpaces does prescribe the usage of ObjectHolder and ObjectList for delayed load cases (more of the merits of delayed loading down below). However, beyond that – there is no prescribed type definition. So for the sake of being technically correct, I will start referring to ObjectSpaces as POCO*.

I think this leads to an interesting conversation around whether ObjectSpaces supports true transparency. In other words, can the domain model developer design types for a particular business problem without regards to the fact that the objects will be persisted to a permanent data store? To answer this question, I want to first look at the pillars of Indigo:

            Service Boundaries are Explicit and Costly to traverse

Think of the data store as one very complex and powerful service. There is no way to avoid this with current relational database technology unless the store 1) is local, 2) is mostly read-only access or used exclusively and 3) has huge amounts of disk space to support a large amount of indexes. In other words, the boundary between the database and the database application is explicit and therefore all current data access technology cannot support transparent access. In then follows that ObjectSpaces cannot support true transparency. At least the developer who is designing the domain types, the mapping and/ or the database must be aware of this explicit boundary. I also doubt that the developer just utilizing the domain types can even use them transparently. There are still runtime concerns like transaction management, batching considerations and dealing with concurrency errors. I suppose one could develop a framework that abstracts away these realities from the domain model, but that would lead toward making the physical persistence transparent to the domain model – which I have yet to see a framework which completely does that and have doubts it can be done for significantly complex architectures.

So, in general - POCO* minimizes the burden of the domain model developer needing to know details about the datastore, however one must still know the datastore exists and lives across a service boundary. In other words, it offers a nice abstraction but not complete transparency.

Matt Warren has some interesting points about the evils of loading on demand. I for the most part, can’t say I particularly disagree with him. I do though suggest for value types, particularly for very large ones – this requirement justifies the means. At the end of the day I don’t want to download the entire movie when I want to see the date is was created. However that said, I would still like to drill down into what I believe the true evil is: preserving graph fidelity. Think for a second that ObjectSpaces did not support delayed loading (then we would have true POCO instead of POCO*). ObjectSpaces still would support the concept of object identity across queries via the ObjectManager. That is, if the set of results for query 1 and query 2 intersect, then ObjectSpaces is going to add the results to query 2 only for the non-intersecting subset. The rest gets thrown away since it is already in memory. So I am going to take the liberty of extrapolating what Matt said about demand loading being evil to really mean that graph fidelity should not be maintained across query executions.

Let’s table that discussion for a second, and discuss a very specific evil with delayed loading when not using a prescriptive solution like ObjectHolder/ ObjectList. Short of code injection, the only way to support delayed loading is through some sort of context that would know how to retrieve related object(s) for a given object at runtime. Then the problem becomes is what is the meaning of null?

Take for example, the case where I have type A which has a reference to type B called b. I run a query that loads A’s but does not include B in the span. In a true POCO model, b is set to a null value by the engine materializing an instance of A. What does that null mean? That A does not have a corresponding b or that b has not been materialized in memory graph? Really it is quite ambiguous unless of course one takes in account the original query. So it you buy that some sort of delayed loading support is desirable, then ObjectHolder/ ObjectList are a necessary divergence from POCO, unless of course code injection is utilized. But that is another topic for another time.

So where are we now? A limited amount of delayed loading is ok, but completely preserving graph fidelity across query executions is a very dangerous game with very bad consequences for those who embrace it’s dark side. Should preserving graph fidelity really be the job of an O/R mapping framework. Well, I can probably argue each side of the discussion. Perhaps another topic for another time.

So, to conclude – my points:

1) ObjectSpaces does not support true transparency. Unless there is a huge technological jump in datastore technology, this is not feasible. What it does support is a nice away to decouple the domain model from the data access layer, however it is not a transparent technology for persisting objects.

2) ObjectHolder/ ObjectList are a necessary divergence from the true POCO model. It allows the delayed loading of large values on demand while disambiguating the value of null. It however, can be abused (see point 3).

3) ObjectSpaces can preserve graph fidelity across query executions and therefore allow the user to materialize sections of their domain model on demand. This however can be abused and can lead to cases of the in-memory representation being out of sync with the database in a significantly complex way. Like most software, this feature is a tool which can either be used correctly or abused; it is up to the developer to do what is right for their design. Further, utilizing this feature correctly probably means even less transparency for ObjectSpaces (see point 1).