[Python-Dev] One more dict trick

Fri, 1 Jun 2001 02:58:08 -0400

[Tim]
> A 128-bit float type is simply necessary for some scientific work:  not
> all problems are well-conditioned, and the "extra" bits can vanish fast.

[ESR]
> Makes me wonder how competent your customers' numerical analysts were.
> Where the heck did they think they were getting data with that many
> digits of accuracy?  (Note that I didn't say "precision"...)

Not all scientific work consists of predicting the weather with inputs known
to half a digit on a calm day <wink>.  Knuth gives examples of
ill-conditioned problems where resorting to unbounded rationals is faster
than any known stable f.p. approach (stuck with limited precision) -- think,
e.g., chaotic systems here, which includes parts of many hydrodynamics
problems in real life.

Some scientific work involves modeling ab initio across trillions of
computations (and on a Cray box in particular, where addition didn't even
bother to round, nor multiplication bother to compute the full product tree,
the error bounds per operation were much worse than in a 754 world).

You shouldn't overlook either that algorithms often needed massive rewriting
to exploit vector and parallel architectures, and in a world where a
supremely competent numerical analysis can take a month to verify the
numerical robustness of a new algorithm covering two pages of Fortran, a
million lines of massively reworked seat-of-the-pants modeling code couldn't
be trusted at all without running it under many conditions in at least two
precisions (it only takes one surprise catastrophic cancellation to destroy
everything).

A major oil company once threatened to sue Cray when their reservoir model
produced wildly different results under a new release of the compiler.  Some
exceedingly sharp analysts worked on that one for a solid week.  Turned out
the new compiler evaluated a subexpression

    A*B*C

by doing (B*C) first instead of (A*B), because it was faster in context (and
fine to do so by Fortran's rules).  It so happened A was very large, and B
and C both small, and doing B*C first caused the whole product to underflow
to zero where doing A*B first left a product of roughly C's magnitude.  I
can't imagine how they ever would have found this if they weren't able to
recompile the code using twice the precision (which worked fine thanks to
the larger dynamic range), then tracing to see where the runs diverged.
Even then it took a week because this was 100s of thousands of lines of
crufty Fortran than ran for hours on the world's then-fastest machine before
delivering bogus results.

BTW, if you think the bulk of the world's numeric production code has even
been *seen* by a qualified numerical analyst, you should ride on planes more
often <wink>.