[Python-Dev] Numerical robustness, IEEE etc.

Nick Maclaren nmm1 at cus.cam.ac.uk
Mon Jun 19 10:55:44 CEST 2006

Brett Cannon's and Neal Norwitz's replies appreciated and noted, but
responses sent by mail.

Nick Coghlan <ncoghlan at gmail.com> wrote:
> Python 2.4's decimal module is, in essence, a floating point emulator based on 
> the General Decimal Arithmetic specification.

Grrk.  Format and all?  Because, in software, encoding, decoding and
dealing with the special cases accounts for the vast majority of the
time.  Using a format and specification designed for implementation
in software is a LOT faster (often 5-20 times).

> If you want floating point mathematics that doesn't have insane platform 
> dependent behaviour, the decimal module is the recommended approach. By the 
> time Python 2.6 rolls around, we will hopefully have an optimized version 
> implemented in C (that's being worked on already).

Yes.  There is no point in building a wheel if someone else is doing it.
Please pass my name on to the people doing the optimisation, as I have
a lot of experience in this area and may be able to help.  But it is a
fairly straightforward (if tricky) task.

> That said, I'm not clear on exactly what changes you'd like to make to the 
> binary floating point type, so I don't know if I think they're a good idea or 
> not :)

Now, here it is worth posting a reponse :-)

The current behaviour follows C99 (sic) with some extra checking (e.g.
division by zero raises an exception).  However, this means that a LOT
of errors will give nonsense answers without comment, and there are a
lot of ways to 'lose' NaN values quietly - e.g. int(NaN).  That is NOT
good software engineering.  So:

Mode A:  follow IEEE 754R slavishly, if and when it ever gets into print.
There is no point in following C99, as it is too ill-defined, even if it
were felt desirable.  This should not be the default, because of the
flaws I mention above (see Kahan on Java).

Mode B:  all numerically ambiguous or invalid operations should raise
an exception - including pow(0,0), int(NaN) etc. etc.  There is a moot
point over whether overflow is such a case in an arithmetic that has
infinities, but let's skip over that one for now.

Mode C:  all numerically ambiguous or invalid operations should return
a NaN (or infinity, if appropriate).  Anything that would lose the error
indication would raise an exception.  The selection between modes B and
C could be done by a method on the class - with mode B being selected
if any argument had it set, and mode C otherwise.

Now, both modes B and C are traditional approaches to numerical safety,
and have the property that error indications can't be lost "by accident",
though they make no guarantees that the answers make sense.  I am
agnostic about which is better, though mode B is a LOT better from the
debugging point of view, as you discover an error closer to where it
was made.

Heaven help us, there could be a mode D, which would be mode C but
with trace buffers.  They are another sadly neglected software
engineering technique, but let's not add every bell and whistle on
the shelf :-)

"tjreedy" <tjreedy at udel.edu> wrote:
> > experience from times of yore is that emulated floating-point would
> > be fast enough that few, if any, Python users would notice.
> Perhaps you should enquire on the Python numerical and scientific computing 
> lists to see how many feel differently.  I don't see how someone crunching 
> numbers hours per day could not notice a slowdown.

Oh, certainly, almost EVERYONE will "feel" differently!  But that is
not the point.  Those few of us remaining (and there are damn few) who
know how a fast emulated floating-point performs know that the common
belief that it is very slow is wrong.  I have both used and implemented
it :-)

The point is, as I mention above, you MUST use a software-friendly
format AND specification if you want performance.  IEEE 754 and IBM's
decimal pantechnichon are both extremely software-hostile.

Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  nmm1 at cam.ac.uk
Tel.:  +44 1223 334761    Fax:  +44 1223 334679

More information about the Python-Dev mailing list