What's Wrong with this Code: The Answer

Bjarne Stroustrup is a brilliant man, which means he's very good at defining programming languages. It also means he's not very good at writing books about the programming languages he's defined. His writing goal is concise precision. Unfortunately, for those of us who have to read what he writes, that quest for concise precision often translates into some of the driest reading in existence.

Yesterday, I asked What's Wrong with this Code? Unfortunately, the answer to that question involves one of the driest sentences in Stroustrup's book on C++. In fact, it's the very first sentence of section 6.2.2 of The C++ Programming Language:

The order of evaluation of subexpressions within an expression is undefined.

Even as I write this, I can hear the collective, "Huh?" So, let's take at look at the key line of code in yesterday's example. It's in the function, Foo():

_val += Bar(val);

This is an expression with subexpressions. It's a contractional form of the expression:

_val = _val + Bar(val);

It's rather like the contraction "it's"--a way of saying "it is" using fewer letters and syllables. The subexpressions are the bits on either side of the + sign and the = sign.

Now, I'd like to translate that expression back into English for you, but I can't. At least I can't give you a definitive translation. The expression is ambiguous. We can read it as:

Evaluate the function, Bar(val), add the result to the value in _val, and place the sum back into _val.

Or, we can read it as:

Take the value that's in _val, add it to the result of the function, Bar(val), and place the sum back into _val.

Thing is, the actual result of the code example I gave will be different depending upon which interpretation we use. That's because Bar() changes _val (programmers call this a subexpression side-effect). If we choose the first interpretation, then the result of yesterday's program will leave _val with a value of 2. If we take the second interpretation, then the result of that code will leave _val with the value of 43.

Well, that's fine, Rick, but what does this have to do with what Stroustrup said? Stroustrup is saying that compiler writers are free to choose either of these interpretations that I've given above. In fact, compiler writers are free to choose one interpretation at one line of code, and another interpretation in a different instance of the same line of code. While this sometimes happens, compiler writers tend to choose one interpretation and stick to it.

And that's where the real problem lies. Remember, this is a contrived example. Given this kind of contrived coding example, most programmers worth half their paychecks can spot the problem a mile away. But, real-world software models real-world problems. And, real-world problems are complex. So, in the real world, someone writes Bar() without the line of code that modifies _val. Some time later, someone else adds Foo() which uses the computation done in Bar(). A little time after that, someone changes Bar() so that it now modifies _val.

All of these changes get checked into the code base, built into the product, and tested. They might even get tested over and over again. We might even ship several versions of the product, without ever running into any problem that any user will see. And, so long as the compiler we're using takes the "correct" interpretation, there won't be any problem in the code that we actually ship.

Then someone comes along and forces us to start using a new compiler, like when Apple announced the move to Intel processors. Should the new compiler interpret these expressions differently than the way the old compiler interprets these expressions, then we'll have a few subtle bugs in our code.

Now, I've read articles where people describe this programming style as "Microsoft's patchwork programming style," or some such similar--as if the word "patchwork" accurately describes incremental development with modular programming and as if Microsoft is the only entity in existence that uses these programming techniques. Actually, everyone does it this way, and any group engaged in trying to write programs to solve complex problems will end up with some code like this in their products. It's inevitable. I'm quite certain that Adobe has encountered a handful of these kinds of issues in their code, and I'll bet that Apple had to deal with a few of them as well.

Worse yet, in all of the compiler errors and warnings that Erik Schwiebert mentioned in the post that prompted me to offer this little programming example, not one of them involved problems of this type. Those of you who have XCode installed on your machines, copy and paste the code from my example into a test file, and compile it using GCC with every warning you can think of turned on. You'll get nary a peep out of GCC.

We know that many people are anxiously awaiting universal binaries for Office and Adobe's products. Please, be patient. You really do want us to find all of these subtle issues before we ship these products to you, and that takes time--mostly testing time even when you have a lab like this one where you can run those tests.

Kudos to Doug for being the first to say where the problem lies, and kudos to Joshua Ochs for being the first to quote the correct section number from Stroustrup. And, Joshua's guess was almost correct. CodeWarrior uses the first interpretation above. GCC uses the second.

Update: Upon reflection, I thought it worth adding, for some of the more experienced programmers out there, some further details that indicate just how insidious this can be. Imagine some code that began its life as Java. Later, it got ported to C++. The key part of the body of Foo() is a tight loop that terminates when _val reaches 0, and Bar() is really invoked through a pointer to member function in an array of pointers to member functions indexed by _val. This is the kind of stuff that keeps me awake at night.



Currently playing in iTunes: Everyday (I Have the Blues) by The Marshall Tucker Band