Report exceptions, don't ignore them!
Here is a great example of code that you should never write:
Or, here is the only somewhat less evil managed equivalent:
I know why programmers do it:
- 'RunCode' isn't really that important. I don't care if it fails.
- I need the system to be really reliable
- I don't want the user to have data loss
- This code has always been this way. I don't want to change it now.
These are all really nice sounding reasons. However, there are a couple of problems with this logic:
- It is impossible to write code that can take an exception at an arbitrary point. Even for managed code, cleanup is a hard problem. After an exception has been ignored, there is usually at least some variable that is in the wrong state. Locks will still be taken, variables won't be cleared, persistent data won't be deleted, memory will be leaked.
- The user has no idea that an exception happened. Given #1, when an exception happens, the user is probably going to run into problems. However, since the exception has been ignored, when they do run into problems, they will have no idea why.
- The product will not improve. These days, Microsoft is big on gathering data from customers on how to improve our products. However, this code just ignored the exception, so this product will never get better.
- Bugs cost more to fix. If the code had crashed or reported the exception, the developer would have an easier time identifying the problem. As it is, the tester is going to have to find a good repro, and the developer is going to need to run the repro under the debugger. You better hope that this isn't some timing bug that goes away when the product is run under the debugger.
- The product is less secure. Why is the code crashing? Maybe it is crashing because someone malicious has found a buffer overrun. By continuing to execute after the overrun, the likelihood that this overrun can be exploited only increases.
- The native code does not properly deal with stack overflow. After a stack overflow occurs, you need to call _resetstkoflw() to reset the page protection attributes on the finial stack page. Failure to do so will mean that if your code overflows a second time, the process will just vanish.
Okay, so hopefully by now I have convinced you that just ignoring all exceptions is the wrong thing to do. So, what should you be doing? You need to come up with a system for reporting unexpected exceptions. For client applications, this means notifying the user. For server applications, this means notifying the administrator. What should be included in this notification? Anything that _you_ will find useful. There should be details that are not for the user encountering the problem but rather the support person or developer that is called upon to solve it.
A few more suggestions:
- Call SetUnhandledExceptionFilter from your native application.
- Save a stack trace in your managed application.
- If you are going to continue past a stack overflow, call _resetstkoflw().
- Test your 'unexpected exception' code. This code is easy to break since you don't normally see it run. You should test it by injecting a fault and making sure that the fault is properly reported. I learned this leason the hard way.
- If you have so evil code that is catching all exceptions that you cannot change, consider using a vectored exception handler, or report exceptions from a try/catch or __try/__except.
In a future blog, I will create some sample code for reporting exceptions.
In conclusion, plan for imperfect code. Everyone has bugs. By reporting them instead of ignoring them, you make it easier to find and fix these problems at all stages in the product's life cycle.