sum for sequences?

Mon Mar 29 19:38:53 EDT 2010

On Mon, 29 Mar 2010 08:12:00 -0700, Steve Howell wrote:

> On Mar 29, 7:40 am, Patrick Maupin <pmau... at gmail.com> wrote:
>> On Mar 28, 9:45 pm, Steven D'Aprano
>>
>> <ste... at REMOVE.THIS.cybersource.com.au> wrote:
>> > And what about tuples? And subclasses of list/tuples? How many
>> > different types need to be optimized?
>>
>> One of the beautiful things about Python is that, for most things,
>> there are few surprises for even new users.  "There should be one
>> obvious way to do it" for the user means that, sometimes, under the
>> hood, there are a lot of special cases for the implementers.
>>
>>
> If nothing else, I think it's reasonably for users to expect symmetry.

Why? What is symmetry in programming?

Since the + operator takes both numbers and lists, and the - operator 
doesn't, does "symmetry" require that we make up some list operation so 
that integers and lists are symmetrical?

> If you can use "+" to concatentate lists, then it seems reasonable that
> something spelled "sum" would concatenate lists as well, and in
> reasonable time.

Where do you get the "reasonable time" from? A *single* + on lists can be 
slooooow. Try this:

L = [None]*10**9
result = L+L

(assuming you've even got enough memory to do so)

With a very few exceptions (e.g. dict lookup being "usually" O(1), list 
append being amortized O(1)), Python makes no promises about performance. 
It's not part of the language. If you, the programmer, are making any 
assumptions about performance that aren't clearly and explicitly 
documented in the official docs, then YOU are at fault, not Python.

>> > In practical terms, does anyone actually ever use sum on more than a
>> > handful of lists? I don't believe this is more than a hypothetical
>> > problem.
>>
>> Right now, it's probably not, because when somebody sums a large list
>> and gets thwacked on the head by the lack of efficiency, they then come
>> here and get thwacked because "everybody knows" they should user
>> itertools or something else; not sum().
>>
>>
> Indeed.  It would be nice if the docs for sum() at least pointed to
> list(itertools.chain(list_of_lists)), or whatever the most kosher
> alternative is supposed to be.
> 
> It only takes a handful of sublists, about ten on my box, to expose the
> limitation of the Shlemeil-the-Painter O(M*N*N) algorithm that's under
> the hood.  It only takes 200 sublists to start getting a 10x degradation
> in performance.

And how many times have you needed to add 200 sublists?

How many times have you needed to add 10 sublists?

Nobody has demonstrated that this is anything more than a hypothetical 
problem.

>> > The primary use case for sum is adding numbers when floating point
>> > accuracy is not critical. If you need float accuracy, use math.fsum.
>>
>> See, I think the very existence of math.fsum() already violates "there
>> should be one obvious way to do it."
>>
>>
> The nice thing about math.fsum() is that it is at least documented from
> sum(), although I suspect some users try sum() without even consulting
> the docs.
> 
> You could appease all users with an API where the most obvious choice,
> sum(), never behaves badly, and where users can still call more
> specialized versions (math.fsum() and friends) directly if they know
> what they are doing.  This goes back to the statement that Patrick
> makes--under the hood, this means more special cases for implementers,
> but fewer pitfalls for users.

More special cases for implementers means more bugs in the language, 
which means I end up writing my own code and ignoring the built-in 
version anyway.

More special cases means I have to pay the cost of high accuracy float 
summation even when I don't need, or want, it.

More special cases means I'm fooled into paying the cost of summing lists 
when I don't need to, because it's easier than importing itertools:

for item in sum(lots_of_lists):
    pass

needlessly creates a large list out of the smaller ones. Even if I don't 
fall for the temptation, and write bad code, I still pay the cost in the 
libraries and applications written by others.

More special cases isn't free. It's MORE costly than teaching users to 
use list.extend or itertools.chain.

-- 
Steven