[Python-ideas] Python3.3 Decimal Library Released

Tue Mar 4 04:40:00 CET 2014

On Tue, Mar 4, 2014 at 12:55 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> On Tue, Mar 04, 2014 at 10:42:57AM +1100, Chris Angelico wrote:
>
>> You could probably make the same performance argument against making
>> Unicode the default string datatype.
>
> I don't think so -- for ASCII strings the performance cost of Unicode is
> significantly less than the performance hit for Decimal:
>
> [steve at ando ~]$ python3.3 -m timeit -s "s = 'abcdef'*1000" "s.upper()"
> 100000 loops, best of 3: 8.76 usec per loop
> [steve at ando ~]$ python3.3 -m timeit -s "s = b'abcdef'*1000" "s.upper()"
> 100000 loops, best of 3: 7.05 usec per loop
>
> [steve at ando ~]$ python3.3 -m timeit -s "x = 123.4567" "x**6"
> 1000000 loops, best of 3: 0.344 usec per loop
> [steve at ando ~]$ python3.3 -m timeit -s "from decimal import Decimal" \
>> -s "x = Decimal('123.4567')" "x**6"
> 1000000 loops, best of 3: 1.41 usec per loop
>
>
> That's a factor of 1.2 times slower for Unicode versus 4.1 for Decimal.
> I think that's *fast enough* for all but the most heavy numeric needs,
> but it's not something we can ignore.

There is a difference of degree, yes, but Unicode-strings-as-default
has had a few versions to settle in, so the figures mightn't be
perfectly fair. But there's still a difference. My point is that
Python should be choosing what's right over what's fast, so there's a
parallel there.

>> It's reasonable to suggest that
>> the default non-integer numeric type should also simply do the right
>> thing.
>
> Define "the right thing" for numbers.

Yeah, and that's the issue. In this case, since computers don't have
infinite computational power, "the right thing" is going to be fairly
vague, but I'd define it heuristically as "what the average programmer
is most likely to expect". IEEE 754 defines operations on infinity in
a way that makes them do exactly what you'd expect. If that's not
possible, nan.

>>> inf=float("inf")
>>> inf+5
inf
>>> 5-inf
-inf
>>> 5/inf
0.0
>>> inf-inf
nan

A default decimal type would add to the "doing exactly what you
expect" operations the obvious one of constructing an object from a
series of decimal digits. If you say "0.1", you get the real number
1/10, not 3602879701896397/36028797018963968.

> My personal feeling is that for Python 4000 I'd vote for the default
> floating point format to be decimal, with binary floats available with a
> b suffix.

Quite possibly. But changing defaults is a hugely
backward-incompatible change, while adding a decimal literal syntax
isn't. I'd be in favour of adding decimal literals and using
performance and usefulness data from that to guide any discussions
about Py4K changing the default.

And learn from Py3K and keep both the b and d suffixes supported in
the new version :)

ChrisA