[Python-ideas] Fast sum() for non-numbers
Joshua Landau
joshua.landau.ws at gmail.com
Wed Jul 3 18:58:04 CEST 2013
On 3 July 2013 13:43, Steven D'Aprano <steve at pearwood.info> wrote:
> On 03/07/13 04:12, Sergey wrote:
>>
>> Hello, python-ideas. Trying to cover everything in one shot, so
>> the message is long, sorry.
>>
>> sum() is a great function. It is the "obvious way" to add things.
>> Unfortunately sometimes it's slower than it could be.
>>
>> The problem is that code:
>> sum([[1,2,3]]*1000000, [])
>> takes forever to complete. Let's fix that!
>
>
> I'm not sure that sum() is the Obvious Way to concatenate lists, and I don't
> think that concatenating many lists is a common thing to do. Traditionally,
> sum() works only on numbers, and I think we wouldn't be having this
> discussion if Python used & for concatenation instead of +. So I don't care
> that sum() has quadratic performance on lists (and tuples), and I must admit
> that having a simple quadratic algorithm in the built-ins is sometimes
> useful for teaching purposes, so I'm -0 on optimizing this case.
This optimises all circumstances where iadd is faster than add. It
makes sense to me.
>> What I suggest is instead of making a copy for every item make just one
>> copy and reuse it as many times as needed. For example:
>> def sum(seq, start = 0):
>> start = start + seq[0]
>> for item in seq[1:]:
>> start += item
>> return start
>
> [...]
>
>> Can this implementation break anything?
>
>
> That will still have quadratic performance for tuples, and it will break if
> seq is an iterator.
>
> It will also be expensive to make a slice of seq if it is a huge list, and
> could even run out of memory to hold both the original and the slice. I
> would be annoyed if sum started failing with a MemoryError here:
>
> big_seq = list(range(1000))*1000000 # assume this succeeds
> total = sum(big_seq) # could fail
Then you can just make it support iterators:
def sum(seq, start = 0):
seq = iter(seq)
start = start + next(seq)
for item in seq:
start += item
return start
>> Alternatives to sum() for this use-case:
>> * list comprehension is 270% slower than patched sum [2]
>> * itertools.chain is 50%-100% slower than patched sum [3]
>
>
> How about the "Obvious Way" to concatenate lists?
>
> new = []
> for x in seq:
> new.extend(x)
What's wrong with sum, if sum is fast?
> -0 on the general idea, -1 on the specific implementation. I'd rather have
> sum of lists be slow than risk sum of numbers raise MemoryError.
How about with my version? I see no obvious downsides, personally.
More information about the Python-ideas
mailing list