sum for sequences?
Steve Howell
showell30 at yahoo.com
Fri Mar 26 10:56:48 EDT 2010
On Mar 26, 7:31 am, Steve Howell <showel... at yahoo.com> wrote:
> On Mar 24, 4:19 pm, Paul Rubin <no.em... at nospam.invalid> wrote:
>
> > kj <no.em... at please.post> writes:
> > > Is there a sequence-oriented equivalent to the sum built-in? E.g.:
> > > seq_sum(((1, 2), (5, 6))) --> (1, 2) + (5, 6) --> (1, 2, 5, 6)
>
> > use itertools.chain for this. A few people have mentioned that sum will
> > also work, but I think for that purpose it could have O(n**2)
> > complexity.
>
> I agree on the practical matter that itertools.chain and other
> solutions are usually the way to go for most tasks that involve
> iterating through several lists.
>
> From a purely academic standpoint, I'm not convinced that sum() is
> inefficient in terms of big-O complexity, though.
>
> showell at showell-laptop:~$ python
> Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
> [GCC 4.3.3] on linux2
> >>> class StupidList:
> ... def __init__(self, lst):
> ... print 'creating', lst
> ... self.lst = lst
> ... def __add__(self, other):
> ... self.lst += '|'
> ... self.lst.extend(other.lst)
> ... return self
> ...
> >>> result = sum([StupidList([1, 2]), StupidList([3,4]),
> StupidList([5,6])], StupidList([0]))
> creating [1, 2]
> creating [3, 4]
> creating [5, 6]
> creating [0]
> >>> result.lst
> [0, '|', 1, 2, '|', 3, 4, '|', 5, 6]
>
> If I'm interpreting the above program correctly, then sum() is doing
> the most efficient thing under the hood--it appears to do the
> equivalent of += without creating unnecessary objects for intermediate
> sums.
>
> I think the special-case error message might be a case where
> practicality simply beats out purity. It would be nice if sum() were
> completely duck-typed-let-you-shoot-yourself-in-foot-if-you-know-what-
> you-are-doing, but maybe this was such a pitfall at one time, that
> extra safeguards were put into sum(). I wonder how severely sum(),
> without the restriction, would underperform join() on modern versions
> of Python, though.
>
> >>> sum('1', '2')
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> TypeError: sum() can't sum strings [use ''.join(seq) instead]
>
> Note that you can easily fake out sum() to get duck typing.
>
> >>> class EmptyStringStarter:
> ... def __add__(self, other): return other
> ...
> >>> empty = EmptyStringStarter()
> >>> sum(['hello ', 'world'], empty)
> 'hello world'
Looking at the code answers my own questions:
http://svn.python.org/view/python/trunk/Python/bltinmodule.c?view=markup
Look for builtin_sum().
First you see the guard against strings:
/* reject string values for 'start' parameter */
if (PyObject_TypeCheck(result, &PyBaseString_Type)) {
PyErr_SetString(PyExc_TypeError,
"sum() can't sum strings [use ''.join(seq) instead]");
Py_DECREF(iter);
return NULL;
}
Py_INCREF(result);
Also, Paul's suspicion that sum() works in O(N squared) for lists is
confirmed by this comment:
/* It's tempting to use PyNumber_InPlaceAdd instead of
PyNumber_Add here, to avoid quadratic running time
when doing 'sum(list_of_lists, [])'. However, this
would produce a change in behaviour: a snippet like
empty = []
sum([[x] for x in range(10)], empty)
would change the value of empty. */
It's interesting, though, that you can construct classes pretty
easily, as I did above, that avoid the quadratic behavior, as long as
you do not mind mutating the start value, which I think is usually
fine, since the original start value usually is not useful afterward
anyway.
More information about the Python-list
mailing list