They grow up so fast

So we were taking a look a the C# language service in VS 2002 and 2003 and we were comparing it to VS 2005 and we got quite a shock.  So in 2002 the language service weighed in at exactly 700k.  (well, not exactly, but close enough for the purposes of this discussion).  Then in 2003 it came it at 772k.  Ok, that seems somewhat reasonable.  Bug fixes would have churned up the source some, and maybe some new features would have accounted for the 10% increase in size.  So where are we at in 2005?



That's 4 times the size of 2002 and 3.6 times the size of 2003.  To give you a feeling for that, here's some bad ascii art:


| 2002 |


| 2003 |


| 2005 |


So why am i so surprised by that?  Well, because a lot of the work done in 2005 was (IMO) to actually reduce the language service.  During all the work done on 2005 we took a long and hard look at all the problems we were having with 2003.  Architectural decisions that were holding us back, customer DTS and QFE problems, scalability issues, etc. etc.  And, we made a team decision that in order to really advance the product we actually needed to simplify a whole bunch of the codebase. 

The code was written IMO with a micro-optimization flair.  And while this did mean that it was quite fast in certain cases, the extra complexity added in by that optimization kept on making certain things difficult.  For example, an optimization to save memory when parsing types with no nested types ended up causing a bug whereby we might lose information about all nested types in your project in certain circumstances (yikes!). 

Another perf optimization was the "ProjData" file.  For those who don't know about it, this file is basically a dump of all our internal datastructures (i.e. information about all the types and members in your project).  It existed in 2002 and 2003 so that when loading a project we wouldn't need to recalculate all that information that we'd done in the last run.  However, the design of that file was very much "in your face".  So rather than be abstracted away so that most language service services woudln't need to know about it, basically everything needed ot know about it and handle it correctly.  If the file format changed you probably had to go fix up about 100 places, and there were serious multi-threaded access woes that went along with it.  At a certain point in time we looked at that and said "this is something that is causing an enormous source of problems for us, and we really need to see what we can do about it".  So we ended up removing that file entirely and we simplified the language service by an order of magnitude. 

By doing these simplifications we found that while our perf might have degraded slightly over 2003 in small cases, it vastly outperformed it in large cases.  And, frankly, on the small cases we're talking about performance lossesof microseconds, whereas on large projects we're talking about gains of minutes (if not more). 

So many complex chunks of code were vastly simplified or ripped out entirely.  When we ran into other performance problems (like having a winforms control with 10s of thousands of elements on it), we attacked the problem from the ground up to make sure we could handle that well.  We also talked to some of our huge enterprise customers to get help with this.  We took projects with 25 MB of source (yes, that's right), and used that to ensure that the language service could really handle what would be thrown at it.

Do we fail at some things?  Yes.  If you have a project with 2 billion files in it, we'll probably not be able to scale to that level.  But, for the most part, we should be able to meet most people's needs.

So why did we grow so much?  It's still somewhat of a mystery to me.  We do have the refactoring code, and the EnC code, the new formatting engine, as well as the new smart tags and whatnot, and all the extra work to support things like generics.  But those really didn't seem like *that* much code.  But i guess when i break down everything the language service does (like):

1) Code Generation – Handles any time we spit out code (like for generate-method-stub and implement-interface

2) CodeModel – We implement the code model that is exposed to teams like Team System for introspection of your source code

3) Intellisense support for completion lists, parameter help, quick info

4) Debugger Interaction – We figure out what should go in many of the debugger displays, and we help bind your breakpoints

5) Snippets support

6) Formatting

7) Code navigation support – Things like the navigation bars, goto definition, and metadata-as-source

8) The new smart tags and little productivity enhancements (like “add-using”)

9) Refactorings

10) Metadata reading

11) WinForms support – For analyzing your code and figuring out what it means

12) Web Development support – Using C# inside HTML code

13) All the massive infrastructure to just support understanding your code as you type it

Then i can see how things might have grown.  There wasn't a single area here where we didn't do quite a lot of work (or all the work in the case of new features), in order to make things better over 2003.  So maybe even though we cleaned up a lot of stuff, made things more scalable and stable, we still ended up increasing our size by a ton.

Hopefully if you use 2005 you'll see it like so:


2003 | |


2005 |2003 features|New 2005 yumminess that you absolutely love|