sum for sequences?

Mon Mar 29 11:12:00 EDT 2010

On Mar 29, 7:40 am, Patrick Maupin <pmau... at gmail.com> wrote:
> On Mar 28, 9:45 pm, Steven D'Aprano
>
> <ste... at REMOVE.THIS.cybersource.com.au> wrote:
> > And what about tuples? And subclasses of list/tuples? How many different
> > types need to be optimized?
>
> One of the beautiful things about Python is that, for most things,
> there are few surprises for even new users.  "There should be one
> obvious way to do it" for the user means that, sometimes, under the
> hood, there are a lot of special cases for the implementers.
>

If nothing else, I think it's reasonably for users to expect symmetry.

If you can use "+" to concatentate lists, then it seems reasonable
that something spelled "sum" would concatenate lists as well, and in
reasonable time.

> > In practical terms, does anyone actually ever use sum on more than a
> > handful of lists? I don't believe this is more than a hypothetical
> > problem.
>
> Right now, it's probably not, because when somebody sums a large list
> and gets thwacked on the head by the lack of efficiency, they then
> come here and get thwacked because "everybody knows" they should user
> itertools or something else; not sum().
>

Indeed.  It would be nice if the docs for sum() at least pointed to
list(itertools.chain(list_of_lists)), or whatever the most kosher
alternative is supposed to be.

It only takes a handful of sublists, about ten on my box, to expose
the limitation of the Shlemeil-the-Painter O(M*N*N) algorithm that's
under the hood.  It only takes 200 sublists to start getting a 10x
degradation in performance.

> > The primary use case for sum is adding numbers when floating point
> > accuracy is not critical. If you need float accuracy, use math.fsum.
>
> See, I think the very existence of math.fsum() already violates "there
> should be one obvious way to do it."
>

The nice thing about math.fsum() is that it is at least documented
from sum(), although I suspect some users try sum() without even
consulting the docs.

You could appease all users with an API where the most obvious choice,
sum(), never behaves badly, and where users can still call more
specialized versions (math.fsum() and friends) directly if they know
what they are doing.  This goes back to the statement that Patrick
makes--under the hood, this means more special cases for implementers,
but fewer pitfalls for users.