Volume 26 Number 02
Don’t Get Me Started - Never, Never Land
By David Platt | February 2011
In last month’s column, I mused about how, in software as in life itself, there’s usually a time for an action and also a time for its opposite. This month, I want to discuss the opposite of that thought, by which I mean events for which there is never a time.
I often compare our software industry to the medical industry. A doctor can’t completely control a patient’s outcome. Medical situations vary, risks always exist, stuff happens. But certain patient-harming events should never, ever occur. We know what causes them, we know how to prevent them; therefore, their occurrence always constitutes malpractice. Reading the list of these “never events” (bit.ly/h9RMl8) makes you wince: operating on the wrong patient, or on the wrong part of the correct patient; leaving surgical instruments inside the patient and so on. My all-time favorite good news/bad news joke—“The bad news is that we amputated the wrong leg. The good news is that your other leg is getting better after all”—brutally illustrates the unacceptability of these events. (I know: “Plattski, you are one sick puppy.” It’s been said before.)
We need to adopt this same idea for our software: that certain occurrences are never, ever, acceptable. We need to define these events, publicize them and educate developers about what they are and how to avoid them. And we need to explain to users that they should never have to tolerate this behavior from their software and shouldn’t be asked to.
Here’s my first proposed never event for software. We wish our programs wouldn’t crash, as doctors wish their patients wouldn’t die (and they envy our reset buttons), but neither is going to happen anytime soon. Because we know that our programs will occasionally crash, I say that losing a user’s work in a crash is a never event. Remember how you’d work in Word or Excel for two hours, then up would pop the dreaded Unrecoverable Application Error box and it was all gone? Not acceptable. Ever. No matter what.
I hear lazy geeks objecting. “That’s not our problem, it’s a matter of education. Users just have to save their work every 10 seconds, then they’ll never lose anything.” Balderdash. That’s not the user’s job, any more than it’s the patient’s job to tell the surgeon: “No, you dimwit, it’s my other arm. Are you sure you remember which end of the scalpel to hold?” It’s the surgeon’s job to get the operation right, as it is ours to get the software right.
Because these events should never occur, it’s a big story when they do. Consider respected surgeon David Ring, who performed the wrong procedure on one of his patients at Massachusetts General Hospital. Rather than cover it up, or discuss it only in a closed mortality and morbidity conference, he published his own case in the prestigious New England Journal of Medicine (bit.ly/gzWN9q). The surgical team performed a full failure analysis to find the root cause. (It’s far more complicated than you might think; read the article.) They reviewed protocols and changed some of them: for example, the alcohol prep that washed away the supposedly indelible surgical site markings was discontinued. The world is a better place for this intolerance of unacceptable events, and for the openness in dealing with those that manage to occur.
Our industry needs the same thing. Why was data lost? It shouldn’t have been. Was the disk full? That’s a capacity problem—we know how to solve that. Because some dimwit yanked the plug out of the wall? That’s a durability problem—we know how to solve that, in several different ways at different price points. Because we forgot to check a null pointer? Easily solvable. And so on.
If our profession is ever to take its rightful place as a pillar of society, we need to adopt this idea from another pillar.
What do you think are the never events in software, and how should we prevent them? Use the link at the end of my bio to tell me. As always, readers will be identified only by first names, unless they request otherwise.
David S. Platt teaches Programming .NET at Harvard University Extension School and at companies all over the world. He’s the author of 11 programming books, including “Why Software Sucks” (Addison-Wesley Professional, 2006) and “Introducing Microsoft .NET” (Microsoft Press, 2002). Microsoft named him a Software Legend in 2002. He wonders whether he should tape down two of his daughter’s fingers so she learns how to count in octal. You can contact him at rollthunder.com.