[Python-ideas] Fast sum() for non-numbers

Ron Adam ron3200 at gmail.com
Thu Jul 11 12:47:19 CEST 2013



On 07/10/2013 08:03 PM, Steven D'Aprano wrote:
> On 11/07/13 07:00, Ron Adam wrote:
>>
>>
>> On 07/10/2013 12:49 PM, Steven D'Aprano wrote:
>>> On 11/07/13 02:10, Sergey wrote:
>>>> On Jul 9, 2013 Steven D'Aprano wrote:
>>>>
>>>>> The fact that sum(lists) has had quadratic performance since sum
>>>>> was first introduced in Python 2.3, and I've *never* seen anyone
>>>>> complain about it being slow, suggests very strongly that this is not
>>>>> a use-case that matters.
>>>>
>>>> Never seen? Are you sure? ;)
>>>>> http://article.gmane.org/gmane.comp.python.general/658630
>>>>> From: Steven D'Aprano @ 2010-03-29
>>>>> In practical terms, does anyone actually ever use sum on more than a
>>>>> handful of lists? I don't believe this is more than a hypothetical
>>>>> problem.
>>>
>>> Yes, and I stand by what I wrote back then.
>>
>>
>> Just curious, how does your sum compare with fsum() in the math module?
>
> math.fsum is a high-precision floating point sum, keeping extra precision
> that the built-in loses. Compare these:
>
> data = [1e-100, 1e100, 1e-100, -1e100]*1000
> sum(data)
> math.fsum(data)
>
> The exact value for the sum is 2e-97.

I was thinking more on the lines of how it worked internally compared to 
sum.  And how it handles different inputs.  Of course it is quite a bit 
slower too.

 >>> timeit("fsum(r)", "from __main__ import fsum\nr=list(range(100))")
15.151492834091187

 >>> timeit("sum(r)", "r=list(range(100))")
2.282749891281128

So fsum will take integers, and converts (or casts) them to floats.

And bytes, as they are integers.

 >>> fsum(b'12345')
255.0


But not strings, even if they can be converted to floats.

 >>> float("12.0")
12.0

 >>> fsum(['12.0', '13.0'])
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
TypeError: a float is required


I would like sum to (eventually) be moved to the math module and have it's 
API and behaviour be the same as fsum.  That would have the least surprises 
and it reduces the mental load when two similar functions act the same and 
can be found near each other in the library.


Cheers,
    Ron















































More information about the Python-ideas mailing list