sum for sequences?

Fri Mar 26 10:56:48 EDT 2010

On Mar 26, 7:31 am, Steve Howell <showel... at yahoo.com> wrote:
> On Mar 24, 4:19 pm, Paul Rubin <no.em... at nospam.invalid> wrote:
>
> > kj <no.em... at please.post> writes:
> > > Is there a sequence-oriented equivalent to the sum built-in?  E.g.:
> > >   seq_sum(((1, 2), (5, 6))) --> (1, 2) + (5, 6) --> (1, 2, 5, 6)
>
> > use itertools.chain for this.  A few people have mentioned that sum will
> > also work, but I think for that purpose it could have O(n**2)
> > complexity.
>
> I agree on the practical matter that itertools.chain and other
> solutions are usually the way to go for most tasks that involve
> iterating through several lists.
>
> From a purely academic standpoint, I'm not convinced that sum() is
> inefficient in terms of big-O complexity, though.
>
>  showell at showell-laptop:~$ python
>  Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
>  [GCC 4.3.3] on linux2
>  >>> class StupidList:
>  ...   def __init__(self, lst):
>  ...     print 'creating', lst
>  ...     self.lst = lst
>  ...   def __add__(self, other):
>  ...     self.lst += '|'
>  ...     self.lst.extend(other.lst)
>  ...     return self
>  ...
>  >>> result = sum([StupidList([1, 2]), StupidList([3,4]),
> StupidList([5,6])], StupidList([0]))
>  creating [1, 2]
>  creating [3, 4]
>  creating [5, 6]
>  creating [0]
>  >>> result.lst
>  [0, '|', 1, 2, '|', 3, 4, '|', 5, 6]
>
> If I'm interpreting the above program correctly, then sum() is doing
> the most efficient thing under the hood--it appears to do the
> equivalent of += without creating unnecessary objects for intermediate
> sums.
>
> I think the special-case error message might be a case where
> practicality simply beats out purity.  It would be nice if sum() were
> completely duck-typed-let-you-shoot-yourself-in-foot-if-you-know-what-
> you-are-doing, but maybe this was such a pitfall at one time, that
> extra safeguards were put into sum().  I wonder how severely sum(),
> without the restriction, would underperform join() on modern versions
> of Python, though.
>
>  >>> sum('1', '2')
>  Traceback (most recent call last):
>    File "<stdin>", line 1, in <module>
>  TypeError: sum() can't sum strings [use ''.join(seq) instead]
>
> Note that you can easily fake out sum() to get duck typing.
>
>  >>> class EmptyStringStarter:
>  ...   def __add__(self, other): return other
>  ...
>  >>> empty = EmptyStringStarter()
>  >>> sum(['hello ', 'world'], empty)
>  'hello world'

Looking at the code answers my own questions:

http://svn.python.org/view/python/trunk/Python/bltinmodule.c?view=markup

Look for builtin_sum().

First you see the guard against strings:

 /* reject string values for 'start' parameter */
 if (PyObject_TypeCheck(result, &PyBaseString_Type)) {
	PyErr_SetString(PyExc_TypeError,
		"sum() can't sum strings [use ''.join(seq) instead]");
	Py_DECREF(iter);
	return NULL;
 }
 Py_INCREF(result);

Also, Paul's suspicion that sum() works in O(N squared) for lists is
confirmed by this comment:

  /* It's tempting to use PyNumber_InPlaceAdd instead of
   PyNumber_Add here, to avoid quadratic running time
   when doing 'sum(list_of_lists, [])'.  However, this
   would produce a change in behaviour: a snippet like
     empty = []
     sum([[x] for x in range(10)], empty)
   would change the value of empty. */

It's interesting, though, that you can construct classes pretty
easily, as I did above, that avoid the quadratic behavior, as long as
you do not mind mutating the start value, which I think is usually
fine, since the original start value usually is not useful afterward
anyway.