[Python-Dev] Numerical robustness, IEEE etc.
Nick Coghlan
ncoghlan at gmail.com
Tue Jun 20 16:50:48 CEST 2006
Nick Maclaren wrote:
> Brett Cannon's and Neal Norwitz's replies appreciated and noted, but
> responses sent by mail.
>
>
> Nick Coghlan <ncoghlan at gmail.com> wrote:
>> Python 2.4's decimal module is, in essence, a floating point emulator based on
>> the General Decimal Arithmetic specification.
>
> Grrk. Format and all? Because, in software, encoding, decoding and
> dealing with the special cases accounts for the vast majority of the
> time. Using a format and specification designed for implementation
> in software is a LOT faster (often 5-20 times).
If by format you mean storing the number as sign, coefficient, exponent, then
yes the decimal module took that part from Cowlishaw. Facundo (sensibly)
ignored some of the hardware-specific details, though. The initial Python
implementation was designed to be reasonably efficient from a speed point of
view without being too hard to follow. Memory efficiency wasn't really a
priority (the only concession to it that I can recall is the use of __slots__
on the decimal objects). So most of the time the decimal coefficients are
stored as tuples of digits, with an internal conversion to long integers in
order to do arbitrary coefficient arithmetic, but staying with the tuples of
digits for multiplication and division by powers of ten.
And yes, we ended up adding an '_is_special' flag to the decimal objects
simply because profiling showed that a whole heap of time was being spent
deciding if any of the special cases applied to an operation. Having a single
flag to check cut that time down appreciably.
The intent was always to replace the internal use of tuples and longs with a
more efficient C implementation - that particular step simply wasn't needed
for the original use case that lead Facundo to write and implement PEP 327.
>> If you want floating point mathematics that doesn't have insane platform
>> dependent behaviour, the decimal module is the recommended approach. By the
>> time Python 2.6 rolls around, we will hopefully have an optimized version
>> implemented in C (that's being worked on already).
>
> Yes. There is no point in building a wheel if someone else is doing it.
> Please pass my name on to the people doing the optimisation, as I have
> a lot of experience in this area and may be able to help. But it is a
> fairly straightforward (if tricky) task.
Mateusz Rucowicz has taken up the challenge for Google's Summer of Code
(mentored by Facundo Batista, the original author of PEP 327 and the decimal
module).
I've cc'ed Facundo, so hopefully he will see this thread and chime in :)
> Mode A: follow IEEE 754R slavishly, if and when it ever gets into print.
> There is no point in following C99, as it is too ill-defined, even if it
> were felt desirable. This should not be the default, because of the
> flaws I mention above (see Kahan on Java).
If the C-coded decimal module proves fast enough (for a given definition of
'fast enough'), I'll certainly be pushing for it to be made the standard float
type in Py3k. Although I'm sure Tim will miss having to explain the result of
repr(1.1) every few months ;)
That said, the fact that 754R *isn't* finished yet, is one of the big reasons
why the decimal module is based on the General Decimal Arithmetic
Specification instead.
> Mode B: all numerically ambiguous or invalid operations should raise
> an exception - including pow(0,0), int(NaN) etc. etc. There is a moot
> point over whether overflow is such a case in an arithmetic that has
> infinities, but let's skip over that one for now.
Let's not skip it, because the decimal module already seems to do pretty much
what you describe here :)
(The below example code was tested with 2.5a2, but the same code should work
in Python 2.4. Uninteresting traceback details have been omitted)
>>> from decimal import Decimal as d
>>> nan = d('NaN')
>>> zero = d(0)
>>>
>>> pow(zero, zero)
Traceback (most recent call last):
...
decimal.InvalidOperation: 0 ** 0
>>>
>>> int(nan)
Traceback (most recent call last):
...
decimal.InvalidOperation
>>>
>>> d('1e999999999') * 10
Traceback (most recent call last):
...
decimal.Overflow: above Emax
>>>
>>> from decimal import getcontext, Overflow
>>> ctx = getcontext()
>>> ctx.traps[Overflow] = False
>>> d('1e999999999') * 10
Decimal("Infinity")
Emax can be changed if the default limit is too low for a given application :)
> Mode C: all numerically ambiguous or invalid operations should return
> a NaN (or infinity, if appropriate). Anything that would lose the error
> indication would raise an exception. The selection between modes B and
> C could be done by a method on the class - with mode B being selected
> if any argument had it set, and mode C otherwise.
>>> from decimal import InvalidOperation
>>> ctx.traps[InvalidOperation] = False
>>> pow(zero, zero)
Decimal("NaN")
>>> int(nan)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __int__ returned non-int (type Decimal)
The selection of the current mode is on a per-thread basis. Starting with
python 2.5, you can use the with statement to easily manage the current
decimal context and ensure it gets rolled back appropriately.
> Heaven help us, there could be a mode D, which would be mode C but
> with trace buffers. They are another sadly neglected software
> engineering technique, but let's not add every bell and whistle on
> the shelf :-)
>>> ctx.flags[InvalidOperation]
8
It's not a trace buffer, but the decimal context does at least track how many
times the different signals have been triggered in the current thread (or
since they were last reset), regardless of whether or not the traps were
actually enabled.
Hopefully Facundo will respond, and you can work with him regarding reviewing
what Mateusz is working on, and possibly offering some pointers and
suggestions. The work-in-progress can be seen in Python's SVN sandbox:
http://svn.python.org/view/sandbox/trunk/decimal-c/
Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
---------------------------------------------------------------
http://www.boredomandlaziness.org
More information about the Python-Dev
mailing list