Why less emphasis on private data?

Steven D'Aprano steve at REMOVE.THIS.cybersource.com.au
Tue Jan 9 16:26:50 CET 2007

On Tue, 09 Jan 2007 10:27:56 +0200, Hendrik van Rooyen wrote:

> "Steven D'Aprano" <steve at REMOVE.THIS.cybersource.com.au> wrote:
>> On Mon, 08 Jan 2007 13:11:14 +0200, Hendrik van Rooyen wrote:
>> > When you hear a programmer use the word "probability" -
>> > then its time to fire him, as in programming even the lowest
>> > probability is a certainty when you are doing millions of
>> > things a second.
>> That is total and utter nonsense and displays the most appalling
>> misunderstanding of probability, not to mention a shocking lack of common
>> sense.
> Really?
> Strong words.
> If you don't understand you need merely ask, so let me elucidate:
> If there is some small chance of something occurring at run time that can
> cause code to fail - a "low probability" in all the accepted senses of the
> word - and a programmer declaims - "There is such a low probability of
> that occurring and its so difficult to cater for that I won't bother"
> - then am I supposed to congratulate him on his wisdom and outstanding
> common sense?
> Hardly. - If anything can go wrong, it will. - to paraphrase Murphy's law.
> To illustrate:
> If there is one place in any piece of code that is critical and not protected,
> even if its in a relatively rarely called routine, then because of the high
> speed of operations, and the fact that time is essentially infinite, 

Time is essentially infinite? Do you really expect your code will still be
in use fifty years from now, let alone a billion years?

I know flowcharts have fallen out of favour in IT, and rightly so -- they
don't model modern programming techniques very well, simply because modern
programming techniques would lead to a chart far too big to be practical.
But for the sake of the exercise, imagine a simplified flowchart of some
program, one with a mere five components, such that one could take any of
the following paths through the program:

START -> A -> B -> C -> D -> E
START -> A -> C -> B -> D -> E
START -> A -> C -> D -> B -> E
START -> E -> D -> C -> B -> A

There are 5! (five factorial) = 120 possible paths through the program.

Now imagine one where there are just fifty components, still quite a
small program, giving 50! = 3e64 possible paths. Now suppose that there is
a bug that results from following just one of those paths. That would
match your description of "lowest probability" -- any lower and it would
be zero.

If all of the paths are equally likely to be taken, and the program takes
a billion different paths each millisecond, on average it would take about
1.5e55 milliseconds to hit the bug -- or about 5e44 YEARS of continual
usage. If every person on Earth did nothing but run this program 24/7, it
would still take on average almost sixty million billion billion billion
years to discover the bug.

But of course in reality some paths are more likely than others. If the
bug happens to exist in a path that is executed often, or if it exists
in many paths, then the bug will be found quickly. On the other hand, if
the bug is in a path that is rarely executed, your buggy program may be
more reliable than the hardware you run it on. (Cynics may say that isn't

You're project manager for the development team. Your lead developer tells
you that he knows this bug exists (never mind how, he's very clever) and
that the probability of reaching that bug in use is about 3e-64. 

If it were easy to fix, the developer wouldn't even have mentioned it.
This is a really hard bug to fix, it's going to require some major
changes to the program, maybe even a complete re-think of the program.
Removing this bug could even introduce dozens, hundreds of new bugs.

So okay Mister Project Manager. What do you do? Do you sack the developer,
like you said? How many dozens or hundreds of man-hours are you prepared
to put into this? If the money is coming out of your pocket, how much are
you willing to spend to fix this bug?


> How is this a misunderstanding of probability?  - probability applies to
> any one trial, so in a series of trials, when the number of trials is
> large enough - in the
> order of the inverse of the probability, then ones expectation must be
> that the rare occurrence should occur...

"Even the lowest probability is a certainty" is mathematically nonsense:
it just isn't true -- no matter how many iterations, the probability is
always a little less than one. And you paper over a hole in your argument
with "when the number of trials is large enough" -- if the probability is
small enough, "large enough" could be unimaginably huge indeed.

Or, to put it another way, while anything with a non-zero probability
_might_ happen (you might drop a can of soft drink on your computer,
shorting it out and _just by chance_ causing it to fire off a perfectly
formatted email containing a poem about penguins) we are justified in
writing off small enough probabilities as negligible. It's not that they
can't happen, but the chances of doing so are so small that we can rightly
expect to never see them happen.

You might like to read up on Borel's "Law" (not really a law at all,
really just a heuristic for judging when probabilities are negligible).
Avoid the nonsense written about Borel and his guideline by Young Earth
Creationists, they have given him an undeserved bad name. 


> There is a very low probability that any one gas molecule will collide
> with any other one in a container

Not so. There is a very low probability that one gas molecule will collide
with a _specific_ other molecule -- but the probability of colliding with
_any_ other molecule is very high.

> - and  "Surprise!  Surprise! " there
> is nevertheless something like the mean free path...

Yes. And that mean free path increases without limit as the volume of the
gas increases. Take your molecule into the space between stars, and the
mean free path might be dozens of lightyears -- even though there is
actually more gas in total than in the entire Earth.

> Now how does all this show a shocking lack of common sense?

You pay no attention to the economics of programming. Programming doesn't
come for free. It is always a trade-off for the best result with the least
effort. Any time people start making absolute claims about fixing every
possible bug, no matter how obscure or unlikely or how much work it will
take, I know that they aren't paying for the work to be done.


More information about the Python-list mailing list