[Python-ideas] Fast sum summary [was Re: Fast sum() for non-numbers - why so much worries?]
Sergey
sergemp at mail.ru
Fri Jul 12 23:57:18 CEST 2013
On Jul 12, 2013 Paul Moore wrote:
> On 12 July 2013 02:12, MRAB <python at mrabarnett.plus.com> wrote:
>
>> While you have your cap on, if you're going to special-case lists, then
>> why not strings too (just passing them on to "".join())?
>
> And of course, that specific question was debated, and the decision taken
> to go with what we have now, when sum was first introduced.
>
> Someone who is arguing for this proposal needs to go back and research that
> decision, and confirm that the reasons discussed then no longer apply. I
> suspect many of them still do.
That's what Alex Martelli, author of sum(), initially did [1]:
> for the simple reason that I special-case this -- when the first
> item is a PyBaseString_Type, I delegate to ''.join
So you can, kind of, say that sum was DESIGNED to have special cases
from the very beginning.
The problem appeared for mixed lists like:
sum(["str1", "str2", SomeClass, "str3"])
Or:
def myit():
yield "str1"
yield "str2"
yield SomeClass
yield "str3"
sum(myit())
''.join() can't handle such cases despite SomeClass could be addable
to string.
So the problem is that you can't just silently delegate the argument
to ''.join() because join() first exhausts entire sequence into its
own temporary list, and then fails. So the sum code had to be somewhat
like that:
read entire sequence into a temporary list
if argument is string: try ''.join()
if join succeeded: prepend first element to result and return it
if argument is unicode: try u''.join()
if join succeeded: prepend first element to result and return it
general code here
That looked unnecessarily complex. To avoid the complications it was
suggested to block strings a little, and point newbies in a better
direction.
Yes, it was known that sum is slow for some types from the beginning,
but there were no attempts to make it faster for other types, I can't
find any suggestions. So the sum() came in that way: slow for lists,
blocked for strings.
I'm not the first one to bother about sum() being unexpectedly slow
[2], and I'm not the first one trying to find a solution [3]. But it
looks like I'm the first to go as far as writing patches.
Maybe, just maybe, if someone came up with my ideas 10 years ago, it
was accepted from the beginning, and we won't be discussing it now.
--
[1] http://mail.python.org/pipermail/python-dev/2003-April/034767.html
[2] http://article.gmane.org/gmane.comp.python.general/441831
Paul Rubin @ 2006-01-12
> A fast implementation would probably allocate the output list just
> once and then stream the values into place with a simple index.
That's what I hoped "sum" would do, but instead it barfs with a type
error.
[3] http://article.gmane.org/gmane.comp.python.general/658610
Alf P. Steinbach @ 2010-03-28
> From a more practical point of view, the sum efficiency could be improved by
> doing the first addition using '+' and the rest using '+=', without changing
> the behavior.
More information about the Python-ideas
mailing list