[Python-Dev] Numerical robustness, IEEE etc.

Nick Coghlan ncoghlan at gmail.com
Tue Jun 20 16:50:48 CEST 2006

Nick Maclaren wrote:
> Brett Cannon's and Neal Norwitz's replies appreciated and noted, but
> responses sent by mail.
> Nick Coghlan <ncoghlan at gmail.com> wrote:
>> Python 2.4's decimal module is, in essence, a floating point emulator based on 
>> the General Decimal Arithmetic specification.
> Grrk.  Format and all?  Because, in software, encoding, decoding and
> dealing with the special cases accounts for the vast majority of the
> time.  Using a format and specification designed for implementation
> in software is a LOT faster (often 5-20 times).

If by format you mean storing the number as sign, coefficient, exponent, then 
yes the decimal module took that part from Cowlishaw. Facundo (sensibly) 
ignored some of the hardware-specific details, though. The initial Python 
implementation was designed to be reasonably efficient from a speed point of 
view without being too hard to follow. Memory efficiency wasn't really a 
priority (the only concession to it that I can recall is the use of __slots__ 
on the decimal objects). So most of the time the decimal coefficients are 
stored as tuples of digits, with an internal conversion to long integers in 
order to do arbitrary coefficient arithmetic, but staying with the tuples of 
digits for multiplication and division by powers of ten.

And yes, we ended up adding an '_is_special' flag to the decimal objects 
simply because profiling showed that a whole heap of time was being spent 
deciding if any of the special cases applied to an operation. Having a single 
flag to check cut that time down appreciably.

The intent was always to replace the internal use of tuples and longs with a 
more efficient C implementation - that particular step simply wasn't needed 
for the original use case that lead Facundo to write and implement PEP 327.

>> If you want floating point mathematics that doesn't have insane platform 
>> dependent behaviour, the decimal module is the recommended approach. By the 
>> time Python 2.6 rolls around, we will hopefully have an optimized version 
>> implemented in C (that's being worked on already).
> Yes.  There is no point in building a wheel if someone else is doing it.
> Please pass my name on to the people doing the optimisation, as I have
> a lot of experience in this area and may be able to help.  But it is a
> fairly straightforward (if tricky) task.

Mateusz Rucowicz has taken up the challenge for Google's Summer of Code 
(mentored by Facundo Batista, the original author of PEP 327 and the decimal 

I've cc'ed Facundo, so hopefully he will see this thread and chime in :)

> Mode A:  follow IEEE 754R slavishly, if and when it ever gets into print.
> There is no point in following C99, as it is too ill-defined, even if it
> were felt desirable.  This should not be the default, because of the
> flaws I mention above (see Kahan on Java).

If the C-coded decimal module proves fast enough (for a given definition of 
'fast enough'), I'll certainly be pushing for it to be made the standard float 
type in Py3k. Although I'm sure Tim will miss having to explain the result of 
repr(1.1) every few months ;)

That said, the fact that 754R *isn't* finished yet, is one of the big reasons 
why the decimal module is based on the General Decimal Arithmetic 
Specification instead.

> Mode B:  all numerically ambiguous or invalid operations should raise
> an exception - including pow(0,0), int(NaN) etc. etc.  There is a moot
> point over whether overflow is such a case in an arithmetic that has
> infinities, but let's skip over that one for now.

Let's not skip it, because the decimal module already seems to do pretty much 
what you describe here :)

(The below example code was tested with 2.5a2, but the same code should work 
in Python 2.4. Uninteresting traceback details have been omitted)

 >>> from decimal import Decimal as d
 >>> nan = d('NaN')
 >>> zero = d(0)
 >>> pow(zero, zero)
Traceback (most recent call last):
decimal.InvalidOperation: 0 ** 0
 >>> int(nan)
Traceback (most recent call last):
 >>> d('1e999999999') * 10
Traceback (most recent call last):
decimal.Overflow: above Emax
 >>> from decimal import getcontext, Overflow
 >>> ctx = getcontext()
 >>> ctx.traps[Overflow] = False
 >>> d('1e999999999') * 10

Emax can be changed if the default limit is too low for a given application :)

> Mode C:  all numerically ambiguous or invalid operations should return
> a NaN (or infinity, if appropriate).  Anything that would lose the error
> indication would raise an exception.  The selection between modes B and
> C could be done by a method on the class - with mode B being selected
> if any argument had it set, and mode C otherwise.

 >>> from decimal import InvalidOperation
 >>> ctx.traps[InvalidOperation] = False
 >>> pow(zero, zero)
 >>> int(nan)
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
TypeError: __int__ returned non-int (type Decimal)

The selection of the current mode is on a per-thread basis. Starting with 
python 2.5, you can use the with statement to easily manage the current 
decimal context and ensure it gets rolled back appropriately.

> Heaven help us, there could be a mode D, which would be mode C but
> with trace buffers.  They are another sadly neglected software
> engineering technique, but let's not add every bell and whistle on
> the shelf :-)

 >>> ctx.flags[InvalidOperation]

It's not a trace buffer, but the decimal context does at least track how many 
times the different signals have been triggered in the current thread (or 
since they were last reset), regardless of whether or not the traps were 
actually enabled.

Hopefully Facundo will respond, and you can work with him regarding reviewing 
what Mateusz is working on, and possibly offering some pointers and 
suggestions. The work-in-progress can be seen in Python's SVN sandbox:



Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

More information about the Python-Dev mailing list