The "a ha" moment.

Don Box is looking for an “a ha” moment on why he should use SqlXml. Unfortunately, he probably won't find it while building his new Blog engine. 

The reason for this is that the real usefulness for SqlXml shows itself in development scenarios where the developer does not own the database schema and/ or the shape (XSD) of the desired Xml. SqlXml is all about translating relational data to and from Xml and handling the translation of Xml features (e.g. hierarchical queries, open content, nested relationships) that are any where from moderately hard to nearly impossible to represent as Sql operations against a relational database. 

And typical example of this is the SqlXml Xml Bulkload feature. Numerous SqlXml users are told “our partners are going to send us a 100 MB Xml document representing new sales information on a daily basis. Here is an XSD that defines the feed. Figure out how to get the Xml into our reporting database“. The possible problems here:

  • The Xml feed may have attributes and elements which do not have appropriate columns or fields in the database.
  • The Xml feed may have open content that needs to be preserved and could change without warning.
  • The XSD is highly hierarchical, while my reporting database is fairly denormalized.
  • My database uses identity values. These identity values need to be generated upon inserting the data and propagated as primary keys based on the hierarchy of the Xml feed.

With SqlXml, I can easily solve that scenario with a few lines of code and annotating the XSD with mapping information. Sure I could write the code myself using SqlClient and an XmlReader - and if the Xml feed was very flat and with little open content, that is what I might be inclined to do. The same analogy exists for querying and generating Xml. If my database looked quite similar to my desired Xml, I would write the Sql queries myself and do my own markup. But it the two differed and I wanted to write some fairly sophisticated XPath statements - I would let SqlXml handle the translation. 

If I was writing a Blog Engine from scratch where I owned the schema of a single writer, multiple reader relational data store, I would probably develop a fairly denormalized schema. All access to the database would be via standard transact Sql queries with manually markup of the results. The real advantage here is that the database schema could closely match the desired Xml shape, so doing the translation would be fairly trivial and optimized for my needs. This is why I don't think Don is going to get any great epiphany if he tries to use SqlXml to develop his Blog engine - in fact, without knowing the details of design, I would still guess that SqlXml is probably overkill.

So, the real overriding question here is when to use SqlXml and when to use SqlClient. And with SqlXml becoming a first class data access framework in the Whidbey release of .net - the question becomes even more interesting. Mark Fussell has developed a nice document outlining our recommendations for data access in Whidbey. (It is worth reading, but keep in mind it is a work in progress and will be updated and expanded based on Whidbey feedback.) IMHO, the new data access options (first class SqlXml framework and ObjectSpaces) are making SqlClient (using transact Sql statements) a low level interface for cases where fine grained control and performance are required or the features of SqlXml or ObjectSpaces are not required. Are most users ready to accept that and have we made the higher level data access options powerful enough to enable user to solve real problems exclusvely using those abstractions? The answer to the first question is probably no - getting developers to give up power is not an easy task. I don't know the answer to the second question, but I worry about it all the time.