August 2009

Volume 24 Number 08

Editor's Note - Editor's Note

By Howard Dierking | August 2009

.imgleft1 { float: left; margin: 5px 10px 10px 0px; width:100px; font-weight: normal; font-size: 14px; color: #003399; font-family: "Segoe UI", Arial; }

I recently had the opportunity to author the editor's note for TechNet Magazine. I have to say that, being a developer, addressing an audience of IT professionals was a bit daunting. Both disciplines are vital to any business, but many times their paths only cross when something is broken. However, I believe that when it comes to the management of data, both developers and IT professionals need to be involved up front in planning solutions. Given that the theme of that particular TechNet Magazine issue was business intelligence and that the theme of this issue of MSDN Magazine is data, I'll address some of the main points I made in that editor's note but more from the developer perspective.

When you get right down to it, the role that software generally plays in most businesses is to get, process, and store data. Therefore, while we may have all sorts of high-level discussions and debates around architectural patterns and object-oriented heuristics, the fact remains that the most elegantly designed application is still effectively taking data from somewhere, doing something to it and putting it somewhere else.

Now don't get me wrong—I'm not suggesting that design heuristics we argue so fervently over are immaterial. After all, a Ford Model T and a Lamborghini Diablo both accomplish the task of moving people from one place to another; but given a choice between the two, it's pretty clear in my mind which one I would choose. Instead, I'm suggesting that we put the same level of thinking and technology consideration into data structure and management that we do for our class models. Furthermore, I'm not just talking about relational schema design or whether or not to use stored procedures. I'm talking much more generally about understanding how the business needs to consume data in a way that provides meaningful value.

One prime example of poor application planning and design as it relates to data is found in reporting. Forget the object-relational impedance mismatch for a moment—the transactional-reporting impedance mismatch seems to be one of those problems that rears its head in just about every business support system that we touch. And in nearly every case, the transactional concerns win by a landslide. Reporting requirements become limited by the inherent complexity and performance limitations found in the highly normalized database schemas of well-designed transactional systems. Even when well-meaning system designers try to accommodate both sets of concerns in the application schema, the result generally does slightly better at meeting the reporting requirements, and it does so at the great expense of the transactional requirements.

So what's the solution? First, get comfortable with the reality that there is not, nor will there ever be, a relational database design that will successfully meet both transactional and reporting requirements—at least not in a sustained way. From there, start assuming that your system should have at least two database schemas—one highly normalized schema that is optimized for processing transactions and one denormalized schema that is optimized for reporting and for mining. What I'm describing is known as the difference between relational and dimensional data modeling. For a great resource on getting started with dimensional modeling, check out The Data Warehouse Toolkit by Ralph Kimball and Margy Ross (Wiley, 2002).

Freeing yourself from the burden of trying to build a single relational schema that takes into account both reporting and transactional concerns will enable you to truly optimize both new schemas according to how they are actually used. Put another way, you will have effectively shifted your problem from a design problem to an extract, transform and load (ETL) problem—and the latter is generally a much more straightforward type of problem to solve. Additionally, I think that once you dig into some of the technologies that support dimensional modeling, from online analytical processing to data mining, you may just find that implementing reporting requirements can actually become a great deal more fun.

At the very least, think of it as ensuring that your databases follow the single responsibility principle.

Visit us at Questions, comments, or suggestions for MSDN Magazine? Send them to the editor:

Thanks to the following Microsoft technical experts for their help with this issue: Mike Barnett, Bob Brumfield, Pablo Castro, Jeff Currier, Diego Dagum, Nelly Delgado, Jeff Derstadt, Manuel Fahndrich, Shawn Farkas, Elisa Flasko, Shawn Hernan, Tim Heuer, Carl Perry, Ayende Rahien, and Blaine Wastell.