[Python-ideas] Python3.3 Decimal Library Released

Tue Mar 4 02:55:16 CET 2014

On Tue, Mar 04, 2014 at 10:42:57AM +1100, Chris Angelico wrote:

> You could probably make the same performance argument against making
> Unicode the default string datatype. 

I don't think so -- for ASCII strings the performance cost of Unicode is 
significantly less than the performance hit for Decimal:

[steve at ando ~]$ python3.3 -m timeit -s "s = 'abcdef'*1000" "s.upper()"
100000 loops, best of 3: 8.76 usec per loop
[steve at ando ~]$ python3.3 -m timeit -s "s = b'abcdef'*1000" "s.upper()"
100000 loops, best of 3: 7.05 usec per loop

[steve at ando ~]$ python3.3 -m timeit -s "x = 123.4567" "x**6"
1000000 loops, best of 3: 0.344 usec per loop
[steve at ando ~]$ python3.3 -m timeit -s "from decimal import Decimal" \
> -s "x = Decimal('123.4567')" "x**6"
1000000 loops, best of 3: 1.41 usec per loop

That's a factor of 1.2 times slower for Unicode versus 4.1 for Decimal. 
I think that's *fast enough* for all but the most heavy numeric needs, 
but it's not something we can ignore.

> But a stronger argument is that
> the default string should be the one that does the right thing with
> text. As of Python 3, that's the case. And the default integer type
> handles arbitrary sized integers (although Py2 went most of the way
> there by having automatic promotion). It's reasonable to suggest that
> the default non-integer numeric type should also simply do the right
> thing.

Define "the right thing" for numbers.

> It's a trade-off, though, and for most people, float is sufficient.

That's a tricky one. For people doing quote-unquote "serious" numeric 
work, they'll mostly want to stick to binary floats, even if that means 
missing out on all the extra IEEE-754 goodies that the decimal module 
has but floats don't. The momentum of 40+ years of almost entirely 
binary floating point maths does not shift to decimal overnight.

But for everyone else, binary floats are sufficient except when they 
aren't. Decimal, of course, won't solve all you floating point 
difficulties -- it's easy to demonstrate that nearly all the common 
pitfalls of FP maths also occurs with Decimal, with the exception of 
inexact conversion from decimal strings to numbers. But that one issue 
alone is a major cause of confusion.

My personal feeling is that for Python 4000 I'd vote for the default 
floating point format to be decimal, with binary floats available with a 
b suffix.

But since that could be a decade away, it's quite premature to spend too 
much time on this.

-- 
Steven