sum for sequences?
Duncan Booth
duncan.booth at invalid.invalid
Sun Mar 28 11:17:46 EDT 2010
Steve Howell <showell30 at yahoo.com> wrote:
> The mildly surprising part of sum() is that is does add vs. add-in-
> place, which leads to O(N) vs. O(1) for the inner loop calls, for
> certain data structures, notably lists, even though none of the
> intermediate results get used by the caller. For lists, you could
> make a more efficient variant of sum() that clones the start value and
> does add-in-place.
Doing add-in-place isn't the only way to make sum more efficient: if you
assume that addition is associative (which of course the builtin sum can't)
then you can form partial sums. e.g. instead of calculating:
(((((((a + b) + c) + d) + e) + f) + g) + h)
you calculate:
(((a + b) + (c + d)) + ((e + f) + (g + h)))
Obviously this requires more space than the naive sum, but not as much as
you might at first expect: you only need to hold log(n) intermediates
values at any time.
> I could guess pretty confidently that the reason this optimization was
> never tried is that sum() has always been intended to be used on
> numerics, since other alternatives exist for strings (join), lists
> (chain), and hand-coded data classes that support add-in-place (roll-
> your-own loop).
Doing it this way helps summing lists or strings (though not as much as
str.join), but it also helps if you need to sum a long list of similarly
sized floats as you'll get a more accurate answer.
See
http://groups.google.com/group/comp.lang.python/browse_thread/thread/9eda29
faf92f532e/027cef7d4429aa3a
for an earlier discussion of this, or just Google comp.lang.python for
'pairwise sum'.
Here's the code I posted in that thread:
def sumpairs(seq):
tmp = []
for i,v in enumerate(seq):
if i&1:
tmp[-1] = tmp[-1] + v
i = i + 1
n = i & -i
while n > 2:
t = tmp.pop(-1)
tmp[-1] = tmp[-1] + t
n >>= 1
else:
tmp.append(v)
while len(tmp) > 1:
t = tmp.pop(-1)
tmp[-1] = tmp[-1] + t
return tmp[0]
and I claimed that my Python coded sumpairs function was faster than the
builtin sum on a list of lists once you had more than about 210 items.
I never did get round to rewriting it in C for a more realistic speed
comparison: summing integers my Python version is about 60 times slower
than the builtin.
More information about the Python-list
mailing list