Lessons of Failure
Humans + Software Development = Always Interesting

TAG | Occam’s Razor



Why Does Simplicity Escape Programmers?

An experiment, posted on LessWrong.com that led to a further diatribe on rationality and rational thinking traps got me thinking too.  Here’s the problem:

Once upon a time, there was an instructor who taught physics students.  One day she called them into her class, and showed them a wide, square plate of metal, next to a hot radiator.  The students each put their hand on the plate, and found the side next to the radiator cool, and the distant side warm.  And the instructor said, Why do you think this happens? Some students guessed convection of air currents, and others guessed strange metals in the plate.  They devised many creative explanations, none stooping so low as to say “I don’t know” or “This seems impossible.

And the answer was that before the students entered the room, the instructor turned the plate around.

This is Occam’s Razor at its finest.  But these unwitting physics students are no different than the average programmer out there when confronted with a bug report.  Take this example of a guy who had a strange logic error in a core Linux package:

A few weeks ago, though, I encountered some bizarre behavior on my desktop, that honestly just didn’t make sense. I spent about half an hour digging to discover what had gone wrong, and eventually determined, conclusively, that my problem was a single undetected flipped bit in RAM.

This guy admirably spent a long, painful session tracking his error to a faulty location in RAM.  But his most likely rationale for why?

For me, bitflips due to cosmic rays are one of those problems I always assumed happen to “other people”. I also assumed that even if I saw random cosmic-ray bitflips, my computer would probably just crash, and I’d never really be able to tell the difference from some random kernel bug.

Cosmic rays!  Now, the possibility exists, that’s true.  But is it the most likely explanation?  No, not by a long shot. More likely:  faulty RAM due to manufacturing defects.  More RAM failures are documented as problems than cosmic ray defects.

Why do we have some perverse belief that all our problems are exotic, unusual and outside of normal?

When you discover a bug in your code, is your response:

  1. Hmmm, I wonder what simple thing I did wrong here?
  2. I wonder if there’s a kernel bug in Linux causing that?

Simplicity isn’t just a goal in optimization, but in finding the source of bugs too.  Never let the truth interfere with a good story, I always say.