Pythonic way to sum n-th list element?

Alex Martelli aleax at aleax.it
Sat Apr 19 16:55:06 EDT 2003


Alex Martelli wrote:

> Steven Taschuk wrote:
> 
>> Quoth Alex Martelli:
>>   [...]
>>> "sum([y[1] for y in x])" would be very simple and direct, IMHO; the
>>> "same" thing expressed with "reduce(operator.add,[y[1] for y in x])"
>>> may only feel "simple and direct" to PhD's in maths -- it feels
>>> _abstruse_ to most people, in my opinion.
>> 
>> What if operator.add were changed to take any number of arguments?
>> Then you could write
>>     operator.add(*[y[1] for y in x])
>> Clear?  Abstruse?
> 
> That would be quite acceptable.  After all, it would then take
> nothing more than a simple:
> 
>   from operator import add as sum
> 
> to get my favourite syntax sugar for the operation:-).

Followup: actually, it would -- the mandatory * is somewhat of
a bother.  I'd rather have an operator.sum taking a sequence
(any *non-empty* sequence, see below -- or we could support
empty sequences with a second optional parameter to use as the
starting value, like reduce's third optional parameter).

> What would you have operator.add return when called with no
> arguments, though?  Either 0 or 0.0 or '' or ... have problems.
> I think it should be callable with ONE OR MORE arguments,
> not with ANY number.

I tried implementing this (as a simple patch to operator.c
in the current CVS sources) but performance is quite a
disappointment due to the mandatory * (which must build a
tuple from the supplied sequence, alas).  Using timeit.py
as usual, operator.add(*range(999)) turns out to be by far
the slowest way to sum numbers -- 967 microseconds on my 
box vs the 625 of reduce with lambda, 331 of reduce with
operator.add, and 255 of the simple for loop.  So, I think
that extending the number of arguments for operator.add
just wouldn't help here.

So, I tried implementing sum in the simplest way, as a
separate patch to operator.c (leaving add alone) -- THAT
gives me 124 microseconds, a 50% speedup with respect to
the simple for loop and clearly sufficient to justify
sum's existence (together with its simplicity, clarity,
and likely frequency of use).  As I was at it, I also
ensured that operator.sum(x) would immediately delegate
to ''.join(x) if x's first item is a string -- this way
it gets the same top performance as ''.join(x), avoiding
the trap into which the "for xx in x: t+=xx" falls here
(for x=map(str,range(999)), sum(x) clocks in at 79
microseconds, vs 78 for ''.join(x) and 1260 for the
simple loop with += ...).

Interesting... I have my doubts on whether Guido would
accept such a patch just a week before a beta, but maybe
I should try submitting it anyway...


Alex







More information about the Python-list mailing list