Developing the Compiler

Working on a compiler can be difficult: you can make what you think is a minor change to fix a bug, check it in to the source tree. If you get the fix wrong and you are lucky, you’ll have your QA team telling you that you just broke a whole series of tests. If you are unlucky the bug will lurk for few weeks or months and the first you’ll hear of it is when some build-lab owner in NT is ‘phoning you up demanding to know why you broke their build and could you get it fixed already. If you are really unlucky their VP will call up your VP and next thing you know you are trying to explain why you broke the NT build to some really senior (and annoyed) people.

So to try and avoid problems like these each developer has to run a series of test suites before they can make a check-in. When I first joined the Microsoft C++ compiler team there where 3 test suites that had to be run:

· Sniff – this took 20 minutes to run and just validated that the most obvious tests like “Hello World” and scribble.exe still built.

· 2Hr BVT & 6Hr BVT – these were larger Build Validation Test suites and as the names suggest they took 2 hours and 6 hours to run respectively

Today we still have these 3 suites and while the Sniff suite still takes around 20 minutes to run both the 2Hr and 6Hr suites now finish in less than 30 minutes – not because we have reduced the number of tests but because machines are now so much faster.

Over time we came to realize that these 3 suites were not enough and so over the years we have added more suites: we now have test suites that target conformance to the C++ Standard, suites that target code that uses attributes, suites that build and execute 3rd Party Libraries like Boost, suites that test the parser we use to provide Intellisense and, most recently, suites that target managed code. Currently before any check-in is made to the compiler source tree a developer will run approximately 14 different test suites.

But even with all these suites we still found we were running into issues: a lot of these issues were of the form of a change to the compiler parser would break the linker; or a change to optimizer would break the parser; or a change to the C runtime would break everyone (Sorry Martyn J). There were also issues were a change to the IA-32 compiler would break either or both of the IA-64 and the AMD-64 compilers (and vice-versa). On top of these desktop platforms the compilers are now used to target all the chips that are used by the Windows CE team. On all platforms there were issues were a retail build would work but a debug build would fail. So it was suggested that each time a change is ready to be checked in each developer should run every other team’s suites as well as their own team’s and that they should run these suites on all platforms and for all builds (retail/debug/test).

While this may sound like a great idea in theory it is not remotely practical: the combinatorics of suites, builds and platforms is huge and also not every developer has access to all the different machines necessary to run all the tests. There was always the problem of a developer “forgetting” to run a suite before they checked in – “I know this change cannot possibly break anything on IA-64” – wrong! So it was clear that we needed a process that was fast, required little or no developer intervention, and could handle running multiple test suites on multiple platforms: welcome to Gauntlet.

“Running the Gauntlet” is a term for a form of medieval punishment in which the miscreant would have to run between 2 lines of knights who would attempt to hit him (or her) with their gauntlets. There is an image of a rather tamer version in the Pieter Brueghel painting “Young Folks at Play” (Note: Pieter Brueghel is the eldest son of the famous Flemish painter Pieter Bruegel).

What is Gauntlet? It is program that runs on a server and which serializes all check-ins: it works as follows:

When a developer is ready to check-in they open up a web-page on the Gauntlet machine and fill out some information about what tree they are checking into (Parser, Optimizer, Linker, Runtime) and what files they are changing. They then submit the check-in to Gauntlet. The Gauntlet machine will then take the diffs from the developer’s machine apply them to its own copy of the source code and then run a whole series of builds and tests on different platforms. Currently for a check-in to the parser the Gauntlet machine will build about 12 different variations of the compiler and it will then run about 35 suites from all areas and on all platforms.

“Doesn’t this take for ever?” I hear you ask. No: Gauntlet is not just one machine: it is a cluster of about 30 machines (most IA-32 but also some IA-64 and AMD-64) – once Gauntlet has built a particular flavor of the compiler it farms out the suites for that flavor to other machines: as all the testing can be done in parallel. A check-in to the parser only takes Gauntlet just over 1 hour – but if we serialized all the building and testing it would take closer to 12 hours. This means we get a maximum amount of testing in a minimum amount of time.

Having Gauntlet has really helped us to improve the quality of the whole compiler toolset: it’s not perfect (it can take a while to get your turn) but it is much better than leaving all the testing up to individual developers.

I’ll probably come back to our development process again in the future but if you have any questions/comments please feel free leave me some comments and I’ll try to address them in a future block.


One question I have gotten is why doesn’t your blog have RSS – it’s a long story. Basically decided to stop accepting any more new bloggers (at least temporarily) so I decided to use (and both of which are now owned by Google: unfortunately to get RSS I need to upgrade to the professional version: but at the moment they are not accepting any more upgrades L- so for now I am stuck without RSS. Sorry.