When capabilities don't meet requirements

So I’ve recently fixed two bugs that have come up that both shared a root cause.  I thought that they were interesting enough that it would be worth sharing with you guys.  In order to make them make sense I first need to give a little bit of an architectural background to it all. 

As you probably know, I’m responsible for the core IntelliSense™ architecture.  What does that mean?  Well,
I work on the language analysis component that watches what you type
and keeps our internal understanding of your source code up to date so
that we can proffer help in many forms (tooltips, parameter help,
completion lists, etc.).  This is a fairly complex task and there are many ways that one could architect a system like this.  In
my opinion the most ideal way would be to have a system that watched
what you did and always kept itself up to date after every action you
performed.  This would be a system that
would be very easy to reason about and which would always be able to
help you the most since there would be no disparity between what you
had written and what it knew about. 

Unfortunately, we do not have such a system.  Why not?  Well, first let me discuss the work that is done.  Consider an action like typing.  After a user has typed a character we might have to do all of the following steps:

  1. Update our token stream for that document.  Certain characters might require the entire document to be re-tokenized.  (imagine adding a /* )
  2. Update our parse tree for that document.  Depending on how much the token stream was churned will determine how much of the file needs to be reparsed and fixed up.  (imagine changing top level scoping by changing an open curly brace {
  3. Update our annotated expression graph.  A single token change could have an impact to the expression graph for your entire solution.  (Imagine changing a namespace name).

This is quite a bit
of work to do, and we would potentially have to do all of that in the
scant milliseconds between the characters a user has typed.  As it stands today with our current architecture, it’s too much work to do in too little time. 

So what did we decide on instead?  Well, the C# Language Service is divided into two threads: 

  1. The
    foreground thread which responds to user interaction events and
    presents the results of an request (like populating and bringing up a
    completion list after <dot> is hit)
  2. The background
    thread which is responsible for keeping our internal expression
    representation up to date after a user has performed an action.  i.e.
    after a character is typed it ensures that we now understand the user’s
    code given the massive amount of change in meaning that could have

By choosing this sort
of model though we have now made a tacit decision that the information
we present to you might not be completely up to date or accurate.  i.e.
when the foreground thread decides to do something like present a
tooltip when you hover over an identifier it is requesting the
information from an expression graph that could very well not be up to
date.  Now, you might be saying to yourself
“well, why don’t you just wait until the information is up to date and
then present it to the user?”.  Interestingly enough, this is exactly what some language services choose to do.  When you’re in VB and you move away from the current bit of code you’re typing you might notice a pause.  What’s going on at that point in time?  Well, the VB Language Analysis System is taking what you’ve just written and making its entire system up to date.  However, in C# land we made a conscious decision that that was not the model we wanted.  We
feel that pausing while a user types is something that our users will
absolutely hate and it’s imperative that we not block the user while
typing.  It should be noted that our architecture does exceedingly well on multi-proc and hyper-threaded machines.  In
those cases the system will be running both of these threads
simultaneously, and in effect we often appear to be up to date in
between your keystrokes.

Now, is this such a big deal?  In practice the answer is almost always “no”.  While
not being fast enough to perform in between your keystrokes, our system
is still extremely fast and uses many snazzy techniques to do the work
it does quickly.  So if you were to have the following code:

     public void Foo()

and you then added a new method after “Foo” like so:

     public void Foo()

    public void Bar()

and you went back up to “Foo” and requested a completion list after “this”, you would almost certainly see “Bar” in the list.  We’re
not necessarily fast enough to work within your keystrokes, but we are
fast enough to deal with keystrokes and a bit of navigation.

When are cases where this being out of date can cause an issue?  One case is WinForms.  In
order for them to accurately display what your form will look like they
need us to accurately examine, decompose, and report back the meaning
of your InitializeComponent method.  In the
past we would just depend on our potentially out of date symbol graph,
but it ended up being the source of many bugs and major headaches for
the user (ever had all your controls disappear?  There’s a good chance it was due to that).  So,
in VS2005, we changed our model so that if WinForms is asking us for
data we will block the foreground thread until the background thread is
finished working.  In effect, for this circumstance we’ve moved to the VB model.   Now,
in order to not give you a horrible experience where you’re asking
yourself “why the heck isn’t the system responding” we will pop up a
progress dialog to tell you what’s happening, why, and how long there
is left.  In that case we considered it
important enough because in the event of us sending bad information to
WinForms, we could end up corrupting your form. 

What we had here was a case where the requirements and capabilities of two different systems were not being met.  WinForms
has a requirement that the Language Service in question provide
high-fidelity analysis of the code in question, whereas the C# Language
Service was designed to be fast, but not necessarily provide high
fidelity information.  Unfortunately, this
realization and formalization wasn’t clear to either the WinForms team
or our team in the VS2003 timeframe which is why many of these bugs


Ok, so that was the background bit, now onto the bugs I was fixing recently.  It’s been known to me for a while that there are more areas where requirements and capabilities are also not in sync.  But it wasn’t clear to me how to best fix the problem.

The first is “Bring Up Completion List On Identifier” (BUCLOI).  If
you haven’t played with VS2005 yet, then here’s how it works: when you
start any C# identifier (or keyword) we’ll bring up the completion list
automatically populated with all the relevant identifiers for the that
location (FYI: if you don’t like this feature it is simple to disable
in the tools|options dialog).  i.e. if you’re starting to write a method and you type “pu” then we’ll show you:


and you can then select “public” without wasting time hitting <ctrl><space> to bring up the completion list.  Now,
in order for this feature to be useful it has to be the case that we
accurately provide you with the identifiers that would be valid in that
location.  This was a great way for us to find bugs in the language service when we introduced this feature more than a year ago.  If
the identifier wasn’t in the list then people would be frustrated and
send us nasty bug reports telling us that this feature was getting in
the way.  However, it allowed us to know
that we weren’t doing a good enough job analyzing your code, or we
weren’t getting all our internal information up to date fast enough.  By pushing this right up in peoples’ faces we were able to improve performance and accuracy by probably an order of magnitude.  However,
even with all the work we did, there was one place where we were still
running into problems and getting people frustrated.  Specifically, in this case:


As it turns out this is one of those areas where a single character
change ends up having a massive effect on our internal symbol graph.  By adding the return value in for this generic method we are changing the parse tree extensively forcing a lot of re-analysis.  On
top of that, generic methods are special in as far as how overloads and
constraints are determined, and so there’s a fair bit of work that
needs to be done to stitch all that information together.  Because
of this it’s possible that on some occasions when you bring up the
completion list for the return type it might not contain the generic
type parameter in it.  And, what’s worse is
that because it is highly likely that the type parameter is one
character long, there’s a very high chance that it will be the prefix
of some other item in the list.  If you then hit space, it will complete to that longer identifier and completely throw you off.  Ugh.  What an awful thing to do to you.  You’re trying to type completely legal code and we totally screw you over.  Definitely a bad thing and something that I think will leave some users cursing at us for.   Once
again, this is a case where the high-fidelity requirement of BUCLOI
contrasts with the lackadaisical nature of the analysis engine. 

Normally, we don’t run into these situations, but here was one case where it did become important.  In
fact, in all grammatical areas in the C# language this is one of the
rare ones where even though we’re so close to the definition of the
item the adding/removing of the reference ends up having very costly
effects on us.  In pretty much all other places it’s not a problem.  Now,
unlike the WinForms case we couldn’t just block in this situation
(since we’re dead set against that, and how would you feel about a
dialog coming up saying “please wait” while you were trying to type the
return type for a method!).  So what did we end up doing?  Well, something I really dislike, but which I think was an acceptable choice for VS2005.  We special cased.  Instead
of having pretty generic and consistent rules for how we process
changes in your code, we now special case what we do if you’re changing
the return type of a generic method.  In
that case we make darn sure that our understanding of the generic
method type parameters is up to date very quickly in the process.  I
would have preferred to not have to do this, but hopefully I can keep
this hack abstracted away while keeping the rest of the architecture
fairly clean.

The second place where this came up was in a great bug that was found by QA a while back.  The
basic gist of it was that after you perform an “extract method’
refactoring, then immediately after the refactoring the “generate
method stub” smart tag would appear to offer to generate the method
you’d just created!  After reading everything so far it’s probably pretty clear to you what had happened.  The smart tag checked to see if the method call it was on could bind to an existing method.  Of
course, because that method had *just* been created (probably <5
milliseconds prior), the IntelliSense™ background thread hadn’t
incorporated it into a symbol graph.  And so, the smart tag saw that it wasn’t there and said: “hey, perfect place to display myself!”  So how did this get missed by us devs?  I mean, you’d think we’d have seen that when we created these features.  Well, as it turns out I develop on a dual proc machine.  And,
on that machine this never repro’ed because on those machines the
internal state was up to date when the smart tag queried it.   (similarly, the first example with method return types never repro’ed on these machines either).  So
once again this was a case where a feature wanted up to date
information but was depending on a system that didn’t guarantee any
such thing.  Any guess as to how we eventually ended up solving this one?

As we move forward I think it’s very important to consider and document these design decisions very carefully.  The
have far reaching consequences, and need to be understood and planned
for when designing features that will end up being affected by them.