What happened ? An Update on Exchange Server 2010 SP1 Rollup Update 4
The Exchange Sustained Engineering team recently made the decision to recall the June 22, 2011 release of Exchange 2010 SP1 Rollup 4. This was not an action we took lightly and we understand how disruptive this was to customers. We would like to provide you with some details that will give you a deeper understanding of what actually happened and, more importantly, what improvements we are making to prevent this in the future.
Q: What actually triggered the recall?
A: While fixing a bug that prevented deleted public folders from being recovered, we exposed an untested set of conditions with the Outlook client. When moving or copying a folder, Outlook passes a flag on a remote procedure call that instructs the Information Store to open deleted items which haven’t been purged. Our fix inadvertently caused the RPC to skip all content that wasn’t marked for deletion because we were not expecting this flag on the call from Outlook on the copy and move operations.
Q: Why didn’t you test this scenario?
A: The short answer is we thought we did. We didn’t realize we missed a key interaction between Exchange and Outlook. The Exchange team has well over 100,000 automated tests that we use to validate our product before we ship it. With the richness and number of scenarios and behaviors that Exchange supports, automated testing is the only scalable solution. We execute these tests in varying scenarios and conditions repeatedly before we release the software to our customers. We also supplement these tests with manual validation where necessary. The downside of our tests is that they primarily exercise the interfaces we expose and are designed around our specifications. They do test positive and negative conditions to catch unexpected behavior and we did execute numerous folder copy and move tests against the modified code which all passed. What we did not realize is that our tests were not emulating the procedure call as executed by Outlook.
Q: Exchange has been around a while, why did this happen now?
A: In Exchange 2010 we introduced a feature called RPC Client Access. This functionality is responsible for serving as the MAPI endpoint for Outlook clients. It allowed us to abstract client connections away from the Information Store (on Mailbox servers) and cause all Outlook clients to connect to the RPC Client Access service.
As part of our investigation, we discovered that there was some specific code added to the Exchange 2003 Information Store to handle the procedure call from Outlook using the extra flag. This code was also carried forward into Exchange 2007. But when the Exchange team added the RPC Client Access service to Exchange 2010, that code was not incorporated into the RPC Client Access service because it was mistakenly believed to be legacy Outlook behavior that was no longer required. That, unfortunately, turned out not to be the case. The fact that we were not allowing a deleted public folder to be recovered was masking this new bug completely.
Q: Are there other similar issues lurking in RPC Client Access?
A: We do not believe so. The RPC Client Access functionality has been well-tested at scale and proven to be reliable for the millions of mailboxes hosted in on-premises deployment and in our own Office 365 and Live@EDU services.
Q: What are you doing to prevent similar things from happening in the future?
A: We have conducted a top-to-bottom review of the process we use to triage, develop and validate changes for Rollups and Service Packs and are making several improvements. We have changed the way we evaluate a customer requested fix to ensure that we more accurately identify the risk and usage scenarios that must be validated for a given fix. Recognizing the diversity of clients used to connect to Exchange, we are increasing our client driven test coverage to broaden the usage patterns validated prior to release. Most notably, we are working even closer with our counterparts in Outlook to use their automated test coverage against each of our releases as well. We are also looking to increase coverage for other clients as well.
Exchange Customer Experience