CLR Inside Out
Ensuring .NET Framework 2.0 Compatibility
What is Compatibility?
Side-by-Side Execution and Backwards Compatibility
If we learned only one thing about compatibility in the past few years, it is that compatibility is much more than avoiding breaking changes. On the Microsoft®.NET Framework and Visual Studio® teams, we do our part to ensure that the products we build are stable platforms that developers can truly rely on. At the same time, developers working on these platforms need to do their part to ensure that their applications can withstand just a little shaking. I'll take you on a behind-the-scenes tour of our compatibility efforts and discuss what we learned in the process.
What is Compatibility?
So what exactly is compatibility, and how many ways can an application be compatible? Well, first there's platform side-by-side (SxS) compatibility. This refers to the ability to have multiple versions of a platform installed and running at the same time, allowing each application to run on the version of the platform on which it was built. For example, on a machine running Windows Vista™ with versions 1.1 and 2.0 of the .NET Framework installed, apps built against the .NET Framework 2.0 will run against the 2.0 version, while any older apps will still run against the 1.1 version.
Platform-backwards compatibility is the ability for an application built on one version of the platform to run properly on later versions of that platform, while platform-forward compatibility means that an application built on one version of the platform can run properly on earlier versions of that platform.
With the .NET Framework 2.0 release, both side-by-side execution and backwards compatibility are supported. The 1.1 release supported forward compatibility as well, but too many features were added and improved in version 2.0 for it to be likely that you could develop an application built on the .NET Framework 2.0 that didn't use them. While it may not be a surprise that we no longer support forward compatibility, we did want to continue to support both side-by-side and backwards compatibility.
Side-by-Side Execution and Backwards Compatibility
On the .NET Framework and Visual Studio teams, the most important of our compatibility goals mandated that installing the .NET Framework should not break existing applications, regardless of the fact that most applications would be relying on side-by-side execution in this scenario and simply run against the version they were built on. Next in order of importance was that we wanted to ensure that applications would run properly if only the latest runtime were installed. There was a bit more leeway here because we expected side-by-side to be the typical configuration, but we also knew that plenty of version 1.1-based applications would be running on version 2.0.
Finally, we wanted applications built on Visual Studio .NET 2002 or Visual Studio .NET 2003 to be easily upgraded to use Visual Studio 2005. We decided that a developer should be able to convert a large, multiproject solution in a couple of hours.
Of course, no matter how careful we were going to be, there would always be some applications that would break when run on a later version of the Framework. When trying to determine if the value of a breaking change outweighs the impact of the break, we would look at these areas of impact, in order: security, standards compliance, reliability or determinism, and correctness.
At the same time we looked at the following criteria for determining the impact of the break: Does it break side-by-side execution for some applications? Will many applications break in a scenarios using only version 2.0? Will it be easy to determine the problem and fix the application?
As you might expect, sometimes applications break for reasons beyond our control. For example, the change that causes the most applications to break with each release is the increment in the version number. There are also many applications that will fail because they have race conditions that will be exposed once the performance characteristics of the runtime have changed. Still other applications will fail because they took dependencies on the runtime that are far beyond the scope of our API's public contracts (for example, using private reflection to read and write private members of Framework classes). These are the types of application breaks that make our guidance on writing compatible programs so vital for application developers and where it's their turn to do their part to ensure continued compatibility.
Case 1: Visual Studio Tools for Office 2003
Of all the compatibility issues discovered in the past few years, this one has far and away generated the most internal controversy. This was one of those cases where there was no actual breaking change in the .NET Framework and yet on the very first build of the 2.0 version, 100 percent of all applications built using Visual Studio Tools for the Microsoft Office System 2003 would be broken simply by installing the same bits with their new version number. We spent a long time discussing solutions, trying to find anything we could change in the .NET Framework to get these applications running, but in the end we had to get creative and at the same time make some very difficult decisions. I'll go into details about the solution and its impact in a bit, but first some details on the actual break.
The model used for Visual Studio Tools for Office 2003 applications is that they are included as part of Word or Excel documents: they get downloaded alongside the .doc or .xls file and are run automatically when the document is opened. To prevent this from becoming the path to a new generation of macro viruses, Visual Studio Tools for Office 2003 utilized Code Access Security (CAS) and ensured that simply downloading these documents would not give their associated applications permission to run; an administrator would have to explicitly grant them permissions on the .NET Framework CAS security store.
This approach left the machine in a very secure state but left these applications open to a subtle, but dangerous, versioning problem: the CAS security store, in the form of the security.config file, is tied to a particular version of the runtime and thus each version gets, and only reads from, its own. But because Visual Studio Tools for Office 2003 applications are loaded inside of Microsoft Word and Microsoft Excel®, they automatically get the latest version of the runtime currently installed on the machine. This collision between a dependency on a version-specific .config file and roll-forward runtime selection caused every Visual Studio Tools for Office 2003 solution to fail on the very first build with the new version number. The applications themselves didn't have any problems and the managed code in Office that hosted them worked as well, but the application had stored its security permissions in the version 1.1 security.config and the runtime itself was, by design, only loading the 2.0 version.
We initially attempted to find a change that we could make to the .NET Framework 2.0 because fixing it there might have prevented end users and developers from ever noticing that there was a problem. In the end, though, we were unable to find a solution confined to the Framework.
The Fix: A Two-Pronged Solution
While we initially wanted to make the change directly in the runtime, the only viable solution was to fix the component in Office that was managing the Visual Studio Tools for Office 2003 applications and validating their security settings. This component, otkloader.dll, needed to be changed so that it would always look in the security.config file that corresponds to the version of the runtime the application was built against. This solution was easy, but it had two important drawbacks. The first is that a Visual Studio Tools for Office 2003 solution built against the .NET Framework 1.1 will always require the version 1.1 runtime to be installed on the machine even if the application is otherwise compatible with a later version. The same would hold true for Visual Studio Tools for Office 2003 applications based on the .NET Framework 2.0.
The second drawback to fixing the break directly into the otkloader leads us to the second prong in this fix. By placing the fix inside of an Office component, rather than the .NET Framework, it becomes possible, and even probable, that the .NET Framework 2.0 will get installed on a given machine before the Office Update does. This means that installing the .NET Framework 2.0 on a machine before the Office Update to the otkloader gets installed will cause all Visual Studio Tools for Office 2003 applications to fail, going against one of the central pillars of our release: "Installing the .NET Framework cannot break existing applications." So we took a step back, restated the problem, and found the solution.
We simply needed to prevent version 2.0 from being loaded until we detected that the otkloader update was present and only then allow the Word and Excel processes to float forward as they normally would. This led us to develop the "Lockback Shim" which forces a process that would normally get the latest runtime to run with a previous version instead. Internally, we affectionately call these Little Blue Switches in contrast to our Big Red Switch which, for testing purposes, forces all processes on the machine to the latest version installed.
Now, by adding Word and Excel to the lockback list, we are able to keep Word and Excel on version 1.1 until we detect that the version 2.0-compatible otkloader has been installed. This was a very hard choice for us to have to make. It means that even on machines that do not have Visual Studio Tools for Office 2003 applications we will refuse to load the .NET Framework 2.0 for Word or Excel add-ins that may require it until we detect the fix for Visual Studio Tools for Office 2003 applications. This might hamper the adoption of version 2.0 as a development framework for Office but we felt that the impact of breaking all Visual Studio Tools for Office 2003 applications would be too great. We have taken many steps to mitigate this—the fix is included as part of the Visual Studio Tools for Office 2003 2.0 runtime, it is available as an automatic update on Office Update, and it will be available for inclusion in Visual Studio setup projects, but it was still a tough call.
For the releases of the .NET Framework 2.0 and Visual Studio 2005 we decided that we needed to be 100 percent compatible with the previous releases. We assembled a set of applications of various types, sizes, and complexity to test against the latest builds. On day one, build one, our pass percentage was very high but as check-ins were made the number started to drop.
Initially, most of the failing applications were blocked by simple bugs—the type that tend to creep in at the beginning of a major development cycle. As these bugs were fixed and others opened, our passing percentages began to trend upwards. We soon reached some of the first intentional breaking changes of the product and it became clear that our goal of 100 percent compatibility was in direct conflict with some of our plans for new features. Thus the Developer Division Compatibility Council (DDCC) was formed.
DDCC was a central group of three senior members of teams from across the division whose job was to make the trade-offs between compatibility and other quality metrics. We retained our target compatibility goal of 100 percent but added a * next to the number with the footnote: "except for DDCC-approved breaking changes." After this, when teams wanted to make a breaking change or when one was discovered, the team would go to DDCC for a decision. The pass percentages for our applications continued to rise and were very high by the time we shipped Beta 1.
Soon after we shipped Beta 1, we received MSDN® Feedback reports indicating that the high passing percentages we saw in our test lab were not realized in the real world. So we released an instrumented build of Beta 1 in the hope that developers would use it and tell us which of the .NET Framework APIs they were depending on. This would help us determine the possible impact of our breaking changes. We did everything we could to publicize this release. Still, in the end we received approximately five uploads of code coverage information. It was clear we needed a new way to reach customers.
As a result, we held the first of many Compatibility Developer Labs. Here we discovered that when configured to run on the .NET Framework 2.0, 100 percent of the version 1.0 and version 1.1 applications tested failed. Except for two Microsoft Office Add-ins, every single application passed in the default side-by-side configuration (with both the .NET Framework versions 1.1 and 2.0 installed).
With the realization that our previous success metrics were insufficient, we took a step back and regrouped. We retained our goal of 100 percent passing except for DDCC-approved breaking changes. In addition, we added absolute metrics detailing what percentage of various types of apps must pass in each of the compatibility scenarios, regardless of reason for failure.
Next we started an extensive search for external applications from many different channels. In a couple of months we expanded our internal test suite from fewer than 50 applications to more than 250 that included everything from Microsoft server applications, to small Web downloads, to enterprise-level inventory and personnel software, and even two robotics applications. With this expanded suite we were able to validate our compatibility against a wide range of applications and could finally identify the issues that were impacting multiple applications.
In addition to expanding our in-house suite of applications, we expanded our first Compatibility Developer Lab and started bringing more developers on site so that we could assist them in testing their applications on the latest builds. We were able to test apps we would never be able to bring in-house and get results for scenarios that we usually can't test.But some apps couldn't be moved from one environment to another and some customers couldn't make it to Redmond. So we went to them.
Case 2: Enterprise Library
Enterprise Library was created by the Microsoft patterns & practices group as a set of sample libraries that provide functionality commonly required in enterprise applications. The libraries are released in source form and could be considered "functional guidance" that gives enterprise developers a solid base to build upon. Depending on how a developer chooses to use the Application Blocks, they can be compiled alongside the application, modified and extended to fit the needs of a particular application or environment, or just consulted as guidance in developing an independent solution. The most common use case is to simply compile it with the application—extensive customization and modification is much less common.
Enterprise Library and its application blocks are exceptionally popular. In our series of compatibility tests in India, we estimate that fully half of the line-of-business (enterprise) applications tested were built using Enterprise Library. This popularity means that any changes to the .NET Framework that impact Enterprise Library will be magnified as every application built with it will be impacted as well.
One of the goals of the Enterprise Library Application Blocks is to implement their functionality while following the best practices for developing on the .NET Framework. So, when developing the Configuration Application Block, the Enterprise Library team, in the January 2005 release, was diligent in not only properly reading the information from the configuration files, but also in verifying that the configuration files were properly formed.
In particular the Configuration Application Block verified that all the attributes in the section declaration element were understood, there were no unregistered configuration sections, and no configuration sections used a reserved name (names starting with ".config").
For the application's .config file these are good things to validate, but because of a subtlety in the way the configuration system works this verification code was running not just on app.config but on machine.config as well.
As it turns out, all three of these validation steps caused problems when running these applications on the .NET Framework 2.0.
We added a new feature that let partially trusted applications read information from certain sections in the configuration section (a new "requirePermission" attribute in the section declaration). We added a new implicit section that had no explicit section handler but was understood by the version 2.0 configuration system. And finally, we used a reserved name (configProtectedData) for that section in order to avoid clashing with any user-defined sections.
These are changes that we would generally not consider breaking but were unfortunately added late in the product cycle (post Beta 2) and cause all applications built with the January release of Enterprise Library to fail when running on the latest version of the .NET Framework. The June release of Enterprise Library did not have these problems, but the ratio for downloads of the January versus June releases was more than 5 to 1.
The fact that the changes that exposed these problems in the Configuration Block were added so late that the July CTP was the first public release that contained them meant that when we discovered them in early August, we only had a few weeks to make any fixes to the .NET Framework before we entered the pre-release stabilization period. Complicating things further was the fact that the second failure (unregistered section) actually obscured the third failure (section using a reserved name) and gave us even less time to react to it.
If there were only a couple applications out there that ran into this problem, we might have shrugged it off and worked with the application vendor to fix the problem. But because it was our guidance and sample code that caused applications to fail and because so many were built that include the failure, a fix was essential. In less than a week the System.Configuration team came up with a private build that contained the fixes for the two known breaks.
- They removed the requirePermission attribute from all elements in the default machine.config file and added them in programmatically at run time.
- They also added a section for configProtectedData and registered a fake section handler for it.
It was only once we had a build with these fixes that we were able to find out that the Configuration Application Block was validating that no reserved sections were used. Then we realized what trouble we were in. The only way we could fix this would be to rename this section so that it didn't start with "config". But we were only two weeks from entering the stabilization period and renaming the section would require finding all the places in Visual Studio and SQL Server™ that relied on this section (there were many) and giving them the new name. Then we would have had to update all of our documentation and samples (in all languages and localizations) to reflect this new name.
All the changes involved with renaming the section would have taken so long that we would have had to significantly slip the release date of Visual Studio 2005 and SQL Server 2005. Then one of our team members proposed an alternate solution that, if implemented, may have gone down in history as one of the ugliest hacks ever: rather than rename the section so that the validation code no longer failed on it, hack the String.StartsWith method so that it in this specific situation it lied and returned false rather than true. We would detect that the application had just called "configProtectedData".StartsWith("config") and return false. We even designed a series of optimization to make this check impact other applications as little as possible:
if (result == true && input.Length == 6 && this.Length == 19 &&
But making that change to the String class so late in the product cycle was extremely risky.
Without fixing the reserved section name issue the other two fixes were useless. We were left with three options. We could rename the section and significantly delay the release of Visual Studio and SQL Server, hack String.StartsWith to return false when it should otherwise return true, or leave customers of the January 2005 Enterprise Library broken and work with the Enterprise Library team to distribute guidance on how to fix their applications.
In the end we went with the third option. We couldn't slip these products and we couldn't introduce such an ugly hack into such a low-level API as the String class without having sufficient time for testing its impact. Plus, we had serious misgivings about polluting the implementation of such an important API. The changes required to update an application depending on Enterprise Library are minimal and we've made sure to provide several good options for developers doing such an update. We have yet to see an application built with Enterprise Library fail simply by installing the newer framework, but this is the one break we left in the product that hurt the most.
In the end, these efforts had a profound impact. By expanding our internal coverage, bringing customers to us, and visiting them directly, we were able to find classes of compatibility issues that we had simply never encountered before. The result was that the vast majority of applications will not be impacted by the installation of the .NET Framework 2.0, most applications will work just fine on machines that only have the latest version, and, with the late-breaking changes to ASP.NET project migration due to feedback from our labs, most developers will have no problems migrating large, multiproject solutions, from Visual Studio .NET 2003 to Visual Studio 2005, in the course of an afternoon.
For a look at two actual breaking changes and how we decided to handle them, see the case studies sidebars (Case 1: Visual Studio Tools for Office 2003 and Case 2: Enterprise Library). They were selected because we expect that they are really the only two compatibility issues that are going to impact more than a few isolated apps. They both also have interesting stories behind them and will impact developers and their applications in different ways.
Send your questions and comments to email@example.com.
Jesse Kaplan is a Program Manager on the CLR team at Microsoft and, among other things, is responsible for application compatibility. You can contact him at firstname.lastname@example.org.