Thoughts about WinFS and related technologies

There's been a lot of discussion about the recent decision not to ship WinFS as a distinct product, but instead to incorporate its technologies into ADO.NET and SQL Server.  I don't have much to contribute to the discussion about WinFS itself since I didn't have anything to do with it.  I do find some of the commentary thought provoking about some larger XML-related questions, however.  OK, I guess I do feel the need to get a couple of things off my chest that I haven't seen beaten to death in other posts.

  • I liked Stephen O'Grady's dissection of the announcement itself: "There's very little contrition in the post, very little apology or reflection. ... When you're announcing the death of a project that has consumed thousands of man hours of time, and affected the future of other Microsoft projects, I would have expected the post to be more retrospective..."
  • Since WinFS was  not a "filesystem" (I think "FS" has stood for Future Store for awhile now), comparison with the new Solaris ZFS is a bit pointless.  That does have some promising-sounding features -- 64 bit checksums, orders of magnitude more capacity, transactional approach to make the order of low-level I/O operations irrelevant  -- but I don't think that even the most fervent hype for WinFS ever promised that kind of stuff.
  • The biggest reason for cancellation was probably simply that after all these years, no MS products had taken a dependency on WinFS.  All those wonderful things that it might be able to do ... if only someone would sign up to be the guinea pig.  Bootstrapping that was presumably why Project Orange was started. 

This leads to the main point I'd argue as far as the relevance of this announcement for what I actually do for a living: No matter how great your technology, it's not going to be a success until it solves more problems than it creates.  It's hardly an original observation to compare WinFS and the Semantic Web  but proponents of the latter should now be pondering the lessons of the former. The main reason that WinFS became "the biggest example of scope creep"   AFAIK is that simpler text indexing / search technologies based on statistics rather than semantics have been plucking the low hanging fruit faster than WinFS could mature; it had to get ever more ambitious to be differentiated from Google desktop search, Apple Spotlight, etc. etc. etc. This is probably equally true of the Semantic Web that had much of its vision implemented by Google and responded by raising its ambitions.  So far, people trying to solve the problems that WinFS set out to solve have done without having to build the conceptual infrastructure that it demanded,

But as much as I like to rub salt in their wounds, the people who ask us to start with an ontology or an entity-relationship model may well be able to keep going after the the simpler search models hit the wall. That remains to be proven, of course, but it's interesting how the WinFS data model lives on (in an evolved form) in the ADO.NET Entity Framework. It still takes some thought and work to develop the entity data models that power this approach (or the ontologies that power the semantic web), but that up-front work allows more flexibility in object mapping, database evolution, etc. than the simpler currently dominant approaches do. 

What I like best about the recent announcement is that it should offer many of the technologies developed for WinFS as evolutionary features on top of the existing ADO.NET and SQL Server products. What's more, since LINQ to Entities puts the entity framework within a continuum that starts from LINQ itself for objects and BLinq for simple websites, developers can go however far down the path to abstraction and conceptual complexity they need to solve their problems. If the simplest thing that could possibly work is all that is needed, stick with LINQ or LINQ to XML and the filesystem; add support for ADO.NET web servers, LINQ to SQL, or LINQ to Entities as the problem demands ... or don't if they turn out to create more problems for an application than they solve.

I think this is the basic reason why Microsoft is offering multiple LINQ-based technologies that do overlap somewhat, and may fit into a single pigeonhole in other people's product categorizations.  One can see them as different ways to do object-relational mapping, but I prefer to think of them as different stops along the continuum from direct mapping of fairly concrete classes and tables to a flexible and abstract mapping. That additional complexity will offer some developers more pain than gain, but it will allow others to abstract away irrelevant details for the relatively small price of some up-front modeling.  Customers should choose which meets their needs, and we'll all just have to see what actually works best for which scenarios.