Nix the DataSet??????

Some interesting comments from my entry on new DataSet features…

Please nix the DataSet from the framework entirely and re-focus on the domain-oriented data access work you were doing prior to Tech-Ed 04. If that's not possible, please have the Visual Studio team desist from distracting .NET culture from Domain-Driven Design with that nasty, nasty typed DataSet RADule :)
I hope this helps, Scott Bellware

I don't advocate that for multiple reasons.

      Sahil Malik

Scott is an old friend from the ObjectSpaces days, so I think he is partially kidding. And I can’t say I totally disagree with him WRT the typed DataSet because I think it has been shown that O/R solutions can do better then code generation. But that is a different topic for a different time.

What I do want to talk about is projections in terms of an O/R solution – which is an interesting topic when discussing the DataSet.

Object instances in O/R solution can basically have five states:

  1. Persisted in the data store (shredded or not).
  2. In memory, but not materialized.
  3. In memory, materialized as an object instance.
  4. Partially in memory, materialized as an object instance.
  5. Partially in memory, not materialized.

#1 is the traditional way of storing data in relational databases. Shredded means the individual properties including references have been set as column values and that the object identity is defined by a single row of storage (in fancier mappings – this row itself might be a project of base tables and/ or views). With relational database advances, there is also the ability to store the object instance wholly (i.e. Sql Server 2005 UDTs, etc) or even partially shred it and store the rest in an untyped bag (i.e. the Xml DataType)

#2 When O/R frameworks pull data out of the data store there is a transition between the rowsets being returned from the data store and the object graph returned to the consumer. Basically, this means that the object instances live in an un-materialized state. Since materialization is not cheap, many O/R solutions – particularly with client side caches, store the data in an “un-materialized” state until the specific object instances are required by the consumer. In fact, with some O/R frameworks – users can access this data.

#3 This is the in-memory materialized domain model – basically the corner stone feature of all O/R frameworks.

#4 Same as 3, but value properties and/ or references are delayed loaded. In terms of domain model consumer and for this discussion, #3 & #4 are the same. (I know there are some issues here, see my comments in the past on this).

#5 are projections of the domain model which have not been materialized as object instances – and may never get materialized. (Actually, #2 is probably a sub case of this). In fact, depending on what the projection is – it might not even be possible or even desirable to materialize the objects.

Take for example a scenario where a domain model consumer wants to search for all the customers with a given last name. From there they want to display the results to the application user and let them select the specific customer. Now, let’s assume that the Customer type is fairly complex and has 30 or so properties of which only a handful are useful for the user to visually select the “right customer”. Performance wise, this would only mean selecting a few of the type properties (a projection of the domain model) in the initial query. This design issue is quite familiar to anyone who has ever written app that binds query results to a grid. For O/R solutions this presents an interesting issue in that the properties in the project may not allow for materialization of the object instances for the retrieved type. For example, the type has several properties which either don’t have meaningful defaults, or if not initialized to persisted values leave an object instance in an illegal state.

Further, for a common last name, performance wise it might not even be desirable to materialize all the customers as object instances even if one could. So the question is, if the objects are not materialized, how is it stored/ accessible in memory. Obviously, “rowsets” are a very good idea since that is generally how relational data is exposed through data access APIs. So, a relational, client side cache (like the DataSet) would seem to be really useful. Interestingly, for this sole purpose, the DataSet is probably overkill. Really all is needed is some sort of client side collection object, even arrays could work. However, for the scenario discussed above, one is going to need other features like binding and sort/ filter/ find capabilities – which the DataSet is very good at. Ironically, these are features that are required by most applications independent of data access model.

My belief has always been that on average roughly two-thirds of all queries executed by applications requiring database persistence are projections WRT to the domain model. Unless an O/R framework has a solution for this, it either forces the domain model designer to include awkward “partial types” (i.e. PartialCustomer), utilize some sort of weak typing hack in their domain model (which kinds of goes against O/R in the first place), or translate projection queries into queries which can always generate results that can be materialized (while accepting the performance overhead).