What exactly do we see in the crash dump we receive from users?
Cyrus has blogged about this some time ago. Here is some more information. First I get a bug opened in our defect tracking database with a title such as
Watson bucket 64043890: Crash caused by HTMLED.DLL!COmTreeElement::RemoveInvalidItems
When open the issue I see
a). a callstack ending up in the code my team owns
b). link to internal Watson web site with additional information
On the Web site I can retrieve list of crashes with number of hits (i.e. times the crash has been reported for a given module and the method/address within it), version of the application and the dll that was loaded and see if there are any other bugs opened on the issue.
I can also have a look if user filled a survey with probable repro steps. Often it is enough just to look at the offending code. However, if it is not clear from the code what exactly has happened, I can download a post-mortem dump,
open it in VS and see callstack and some variables. Unfortunately post mortem debugging is much more limited than the normal daily debugging. Moreover, production code is heavily optimized, so often we don't get much from variable values. Nevertheless, in majority of cases we are able to figure out what happened. Even if we are completely baffled, we can still probably apply some architectural changes next release that will eliminate the problem. For instance, in Whidbey we changed objects references from instances created with new/delete and referenced using a pointer to a refcounted COM object with an interface. This completely eliminated crashes caused by references to deleted objects.
So next time you see a crash, please click 'send report to Microsoft'! :-)