Garbage collection vs. expression evaluation

A few months ago, a customer contacted us with a problem that seemed to be related to WCF. As it turned out, its roots were deep within the CLR…

The error was indeed thrown by WCF: it was a System.ServiceModel.CommunicationObjectAbortedException: “The communication object, System.ServiceModel.Channels.HttpChannelFactory+HttpRequestChannel, cannot be used for communication because it has been Aborted.” There were two factors that made this error mysterious, right from the beginning:

1. Issue appeared only in release build, and there it was pretty consistently reproducible. Debug build didn’t exhibit this behavior.

2. The exception was thrown during the first call through a properly created WCF channel on the client-side. Why would its underlying channel be aborted?

Repro on the rescue

Fortunately, I could reproduce the problem on my machine with customer’s application. At the first glance, it wasn’t obvious what’s happening: in one single (but complex) line of code, among many other things, this particular HttpRequestChannel also got created somewhere deep in WCF, but when preparation was complete to issue the call – still on the same line – the HttpRequestChannel was already aborted.

It was a really long line: first, it called a function that created a WCF Channel via a few function calls, and put it into a templatized wrapper type, implementing IDisposable. Once this type was returned, its member function was called, to obtain the ServiceContract’s interface for the channel. On this interface, an actual method was called with some parameters.

So I wanted to find out who’s aborting the HttpRequestChannel. It was called from, System.ServiceModel.ICommunicationObject.Close, which in turn got called by customer’s templatized wrapper. This was where things got more interesting. Obviously, this call must have happened on a different thread then the one executing that single line of code. But which thread? It wasn’t hard to find out from the call stack: it was the finalizer thread.

Now the picture cleared out. The debugger showed that the wrapper object wasn’t rooted anywhere. So indeed it was a candidate for GC, and because WCF channel creation is a heavy-weight operation, it did induce GC. With debug build, the compiled IL code looks to be more conservative and puts these things into “temporary variables” on the stack.

Lessons learnt

But why wasn’t this object rooted on the stack at least?

Because it was a temporary result of expression evaluation. This makes sense, since only parameters, locals and call addresses are stored on the stack, but no temporary objects (unless we’re in debug build). Still it surprised me that temporaries are not preserved at least until the evaluation of the current expression finishes. C++, for example, guarantees that destructors won’t be called until the expression evaluation completes.

However, managed code is different. I learnt from a developer on the CLR that such temporaries are not guaranteed to remain alive. Most of the time, of course, you don’t need to count with such side-effects. However, be aware that GC is non-deterministic and everything that is not rooted can be eliminated at any time. Knowing this can help tracking down insidious bugs like this.