My BFD duties

Recently it was my turn to perform BFD duties. What is the BFD you said? Well, let’s start from the beginning. Visual Studio division has a large build lab which is building and servicing multiple branches of the code. There are large branches called "virtual build labs" or VBLs as well as private team branches. VBLs are building the entire product and there are few of them. VBL is typically used to branch the entire source tree for major releases like Beta 1 or CTP, or to test a new compiler or CLR before introducing it to the entire division. Team private branch services just one team (like VB, C# or Web Tools) and is maintained by the said team. 

Private branches improve productivity in the large organizations since code base becomes much more stable. Think about the scale: if every developer broke something just once a year, in the division where several hundreds of developers were actively changing the code, build, tests or both would be broken most of the time. Private branches create stable environment so developers are practically never blocked. Team picks up a good build, uses it for a couple of weeks and then picks up a new one. However, frequent integrations are required to and from the main product branch. Integration from the main branch to the private branch is called forward integration and integration back into the main branch is called reverse integration. Integrations are usually performed by a dedicated build engineer. A full build of VS is performed and a large set of suites is run. The set usually consists of all team suites plus all suites that build lab runs on every build. This set is larger than the typical set that developers run at checkin time so it usually catches any bugs that might have slipped through in developer checkins. When everything succeeds, changes are submitted into the main branch. There are additional steps that prevent collisions so that two teams don’t submit at the same time, but that’s beyond the scope of my blog entry :-). You can read more about VBL structure in the Release team blog. There is also an informative entry in the Matt Pietrek’s blog.

At every checkin developer runs set of automated tests (checkin suites) in order to verify that nothing is broken. These tests are similar, but not the same as QA automation tests (see Scott's blog entry on the ASP.NET and VWD QA process). The reason is that it is physically impossible to run the entire QA automation stack at every checkin since there are literally thousands of tests. It is equally impossible to run the entire stack of VS developer checkin suites from all teams since otherwise each checkin will take too long time. Therefore during checkin developers only run a subset of tests that covers 100% of their primary area of responsibility as well as a few selected tests that cover other VS features. For example, we at Visual Web Developer run full stack of our team suites plus a limited set of suites from debugger, IDE and other teams that use parts of our code. We run tests from other teams since there is a chance that we can break, for example, client script debugging. For instance, if we introduce bug in the code that verifies breakpoint location, we will break debugger functionality. However, chances that we can break, say, C# compiler are close to zero. At the other hand, changes in C# refactoring can break refactoring in Web projects so C# team runs our refactoring tests. If your checkin breaks another team, that team has all rights to add their suite to your set and force you to run it from now on.

Still, there is certain probability that tests may get broken. For example, developers and team integrators typically run suites on the complete VS build which is roughly equivalent to the Visual Studio Enterprise. Build lab, however, runs tests on every SKU that we ship including localized versions. A test might succeed in the VS Enterprise setup but fail in one of the Express SKUs simply because not all components are present in all SKUs. Or suite may succeed in the English version but fails in the Japanese build since some test code may rely on English strings. Build lab also uses different machines with different CPU speeds and amount of RAM so build lab suite run may expose latent timing bugs.

What happens if some test breaks? Build lab opens P0 or P1 bugs (depending on the suite that is broken) and assigns bugs to suite owners. Each team typically has a developer that takes first look at the broken test. This developer is called Build Facilitating Developer or BFDBFD reduces noise and significantly reduces distraction of other team members. For example, multiple suites may fail for the same reason: say, assert pops up somewhere and block further test execution. Or test gets broken because product setup failed for some reason. Instead of several people wasting time looking at the same issue, one developer quickly figures out the problem and either fixes it or assigns it to another team.

At the Visual Web Developer team we have rotating assignment: every developer performs BFD duties for a week. Some weeks are quiet, some weeks are busy. Mine was relatively uneventful. I fixed one issue where suite was failing because toolbox was start sliding away earlier than before and suite was not finding thee window. Another day new STL library uncovered a latent bug in another suite code written in C++ (that was the test code, not the product code). Generally, closer to the end of the product cycle BFD duties are not very taxing since code churn is getting smaller and wrinkles in the test code have been pretty much ironed out.