Larry's rules of software engineering, part 1: Every software engineer should know roughly what assembly language their code generates.

The first in an ongoing series (in other words, as soon as I figure out what more rules are, I’ll write more articles in the series).

This post was inspired by a comment in Raymond’s blog where a person asked “You mean you think I’m expected to know assembly language to do my job?  Yech”.

My answer to that poster was basically “Well, yes, I do expect that everyone know assembly language”.  If you don’t, you don’t really understand what your code is doing.

Here’s a simple quiz:  How many string objects are created in the following code?

int __cdecl main(int argc, char *argv[])


      std::string foo, bar, baz


      foo = bar + baz + “abc”;


The answer?  5.  Three of the strings are obvious – foo, bar, and baz.  The other two are hidden in the expression: foo = bar  + baz + “abc”.

The first of the hidden two is the temporary string object that’s created to encapsulate the “abc” string.  The second is one that’s used to hold the intermediate result of baz + “abc” which is then added to bar to get the resulting foo.  That one line of code generated 188 bytes of code.  Now that’s not a whole lot of code today, but it can add up.

I ran into this rule a long, long time ago, back in the DOS 4 days.  I was working on the DOS 4 BIOS, and one of the developers who was working on the BIOS before me had defined a couple of REALLY useful macros to manage critical sections.  You could say ENTER_CRITICAL_SECTION(criticalsectionvariable) and LEAVE_CRITICAL_SECTION(criticalsectionvariable) and it would do just what you wanted.

At one point, Gordon Letwin became concerned about the size of the BIOS, it was like 20K and he didn’t understand why it would be so large.  So he started looking.  And he noticed these two macros.  What wasn’t obvious from the macro usage was that each of those macros generated about 20 or 30 bytes of code.  He changed the macros from inline functions to out-of-line functions and saved something like 4K of code.  When you’re running on DOS, this was a HUGE savings (full disclosure – the DOS 4 BIOS was written in assembly language, so clearly I knew what the assembly language that I generated.  But I didn’t know the assembly language the macro generated).

Nowadays, memory pressures aren’t as critical, but it’s STILL critical that you know what your code is going to generate.  This is especially true if you’re using C++, since it’s entirely possible to hide huge amounts of object code in a very small amount of source.  For instance, if you have:

CComPtr<IXmlDOMDocument> document;

CComPtr<IXMLDOMNode> node;

CComPtr<IXMLDOMElement> element;

CComPtr<IXMLDOMValue> value;


How many discrete implementations of CComPtr do you have in your application?  Well, the answer is that you’ve got 4 different implementations – and all the code associated with CComPtr gets duplicated FOUR times in your application.  Now it turns out that the linker has some tricks that it can use to collapse identical implementations of methods (and it uses them starting with VC.Net), but if your code is targeting VC6, or if it’s using some other C++ compiler, you can’t guarantee that you won’t be staring at <n> different implementations of CComPtr in your object code.  CComPtr is especially horrible in this respect, since you typically need to use a LOT of interfaces in your application.  As I said, with VC.Net onwards, this isn’t a problem, the compiler/linker collapses all those implementations into a single instance in your binary, but for many templates, this doesn’t work.  Consider, for example std::vector.

std::vector<short> document;

std::vector<int> node;

std::vector<float> element;

std::vector<bool> value;

This requires that there be four separate implementations of std::vector compiled in with your application, since there’s no way of sharing the implementation between them (since the sizes of all the types are different, and thus the assembly language for the different implementations is different).  If you don’t know this is going to happen, you’re going to be really upset when your boss starts complaining about the working set of your application.

The other time that not knowing what’s going on under the covers hits you is when a class author accidentally hides performance problems in their class. 

This kind of problem happens a LOT.  I recently inherited a class that used operator overloading extensively.  I started using the code, and as I usually do, I started stepping though the code (to make sure that my code worked) and realized that the class implementation was calling the copy constructor for the class extensively.  Basically it wasn’t possible to use the class at all without a half a dozen trips through the heap allocator.  But I (as the consumer of the class) didn’t realize that – I didn’t realize that a simple assignment statement involved two trips through the heap manager, several calls to printf, and a string parse.  The author of the class didn’t know this either, it was a total surprise when I pointed it out to him, since the calls were side effects of other calls he made).  But if that class had been used in a performance critical situation, we’d have been sunk.  In this case, the class worked as designed; it was just much less efficient than it had to be.

As it is, because I stepped through the assembly, and looked at ALL the code that was generated, we were able to fix the class ahead of time to make it much more implementation friendly.  But if we’d blindly assumed that since the code functioned correctly (and it did), we’d have never noticed this potential performance problem.

If the developer involved had realized what was happening with his class, he’d have never written it that way, but because he didn’t follow Larry’s rule #1, he got burned.