Stress testing Visual Studio 2010

In the past several months Visual Studio and I have been really busy stress testing each other. This post is a general overview on what we've been up to and what kind of testing we're doing. I've learned a lot about stress testing and I have to say it's actually a lot of fun, so I guess it's worth sharing. I'll try to make this a series of several posts, diving into more technical details in the upcoming posts.

Background

During Beta 1 and Beta 2 it became painfully obvious that the new VS had an obesity problem: it was slow, consumed a lot of memory and the worst thing, with enough modules loaded it stopped fitting into the 2GB address space on 32-bit machines. There were several reasons for this, which Rico Mariani, Brian Harry and others have extensively blogged about. In a nutshell, with a lot of new functionality a lot more modules were loaded into memory. Besides, we now had to fully load the CLR and WPF at application startup. Moreover, there were all kinds of memory leaks all over the place.

Making performance a top priority

Of course this wasn't good, so our management made the right decision to make performance our top priority. Jason really took it seriously and we dedicated a lot of people to work fulltime to make Visual Studio fast and lean. As part of this effort I became a member of a virtual team called "Perf SWAT". This team is responsible for essentially three things: performance, memory consumption and design-time stress.

Performance is clear: we need to be fast. Memory consumption is clear too: when we load, we need to take as little memory as possible, and avoid things such as double-loaded modules, loading both NGEN and IL versions of an assembly, and so on.

Design-time stress on the VSL team

As for design-time stress, the goal is once we're loaded into memory, jitted, warmed up and all the caches are filled, we should not continue to grow in consumption. This means find and eliminate all memory and resource leaks. Run-time stress means finding leaks in the CLR and BCL, design-time stress means finding leaks in the VS and tooling. I am responsible for design-time stress testing for the VSL team (managed languages). I need to make sure that there are no significant leaks in 4 areas:

  1. C# IDE and editor integration (C# code editor, navigation, refactorings and other core C# areas)
  2. VB IDE and editor integration
  3. F# IDE
  4. Hostable Editor (Workflow Designer in VS 2010 is essentially hosting a full-blown language service to show IntelliSense in the expression editor on the workflow diagram)

Progress

The good news is that we've made tremendous progress since Beta 2 and have brought the product into a much better state: it is much faster, more responsive, takes up much less memory and we also hope to have eliminated all major known memory leaks. A common complaint was that VS was growing in memory during usage and you had to restart it after a certain time. Right now we hope that you can mostly keep Visual Studio open for days (even weeks) without having to restart it.

8 hour stress tests

The official sign-off criteria is that the end user needs to be able to keep VS open for an entire work week without any noticeable performance degradation (this means 5 days times 8 hours a day). We've calculated that in average continuous human usage of 40 hours is equivalent to running our tests for 8 hours (tests are doing things faster than a human).

We have identified and implemented a series of 22 tests for all the 4 language teams mentioned above. Each test covers one continuous kind of activity, e.g. CSharpStressEditing, CSharpStressNavigation, CSharpStressIntelliSense, CSharpStressDebugging, CSharpStressUI, VBStressEditing, VBStressProjectSystem, FSharpStressEditing, and so on.

Each test runs for 8 hours on a machine in the lab and VS memory usage details are automatically logged. We've also developed tools to automatically analyze the stress logs and produce Excel spreadsheets and charts for analysis and reporting.

Several months ago a typical test would start at about 300 MB ProcessWorkingSet and crash after several hours with OOM (Out-Of-Memory exception). None of the tests would even be able to run for 8 hours. After finding and fixing a lot (a lot!) of bugs, we were able to get it running for 8 hours – VS memory usage grew from about 300-400 MB of WorkingSet to over 1 GB over the period of 8 hours (that was anywhere from 200-500 stress iterations).

Right now a typical test starts at about 150-200 MB and finishes 8 hours later at 200-300 MB. Also, instead of 500 iterations, it is able to do 3000-5000 iterations during 8 hours on the same hardware. Which means we made it considerably faster and also reduced the leaks in major feature areas to a minimum (right now a feature is considered not leaking if there is average increase of less then ~5KB per iteration).

I'll try to continue blogging about our stress testing and dive more into the technical details: what we measure, how we measure, how we find bugs and how we'll know when we're eventually done.