[Python-ideas] Fast sum summary [was Re: Fast sum() for non-numbers - why so much worries?]
MRAB
python at mrabarnett.plus.com
Sat Jul 13 01:11:24 CEST 2013
On 12/07/2013 22:57, Sergey wrote:
> On Jul 12, 2013 Paul Moore wrote:
>
>> On 12 July 2013 02:12, MRAB <python at mrabarnett.plus.com> wrote:
>>
>>> While you have your cap on, if you're going to special-case lists, then
>>> why not strings too (just passing them on to "".join())?
>>
>> And of course, that specific question was debated, and the decision taken
>> to go with what we have now, when sum was first introduced.
>>
>> Someone who is arguing for this proposal needs to go back and research that
>> decision, and confirm that the reasons discussed then no longer apply. I
>> suspect many of them still do.
>
> That's what Alex Martelli, author of sum(), initially did [1]:
>> for the simple reason that I special-case this -- when the first
>> item is a PyBaseString_Type, I delegate to ''.join
>
> So you can, kind of, say that sum was DESIGNED to have special cases
> from the very beginning.
>
> The problem appeared for mixed lists like:
> sum(["str1", "str2", SomeClass, "str3"])
>
> Or:
> def myit():
> yield "str1"
> yield "str2"
> yield SomeClass
> yield "str3"
>
> sum(myit())
>
> ''.join() can't handle such cases despite SomeClass could be addable
> to string.
>
> So the problem is that you can't just silently delegate the argument
> to ''.join() because join() first exhausts entire sequence into its
> own temporary list, and then fails. So the sum code had to be somewhat
> like that:
> read entire sequence into a temporary list
> if argument is string: try ''.join()
> if join succeeded: prepend first element to result and return it
> if argument is unicode: try u''.join()
> if join succeeded: prepend first element to result and return it
> general code here
>
> That looked unnecessarily complex. To avoid the complications it was
> suggested to block strings a little, and point newbies in a better
> direction.
>
> Yes, it was known that sum is slow for some types from the beginning,
> but there were no attempts to make it faster for other types, I can't
> find any suggestions. So the sum() came in that way: slow for lists,
> blocked for strings.
>
> I'm not the first one to bother about sum() being unexpectedly slow
> [2], and I'm not the first one trying to find a solution [3]. But it
> looks like I'm the first to go as far as writing patches.
>
> Maybe, just maybe, if someone came up with my ideas 10 years ago, it
> was accepted from the beginning, and we won't be discussing it now.
>
What if it did this:
1. Get the first item, or the start value if provided (i.e. not None).
2. Ask the item for an 'accumulator' (item.__accum__()).
3. Add the first item to the accumulator.
4. Add the remaining items to the accumulator.
5. Ask the accumulator for the result.
If there's no accumulator available (the "__accum__" method isn't
implemented), then either fall back to the current behaviour or raise a
TypeError like it currently does for strings.
More information about the Python-ideas
mailing list