[Python-Dev] sum(...) limitation

Tue Aug 12 05:50:21 CEST 2014

Chris Barker - NOAA Federal writes:

 > Is there anything in the language spec that says string concatenation is
 > O(n^2)? Or for that matter any of the performs characteristics of build in
 > types? Those striker as implementation details that SHOULD be particular to
 > the implementation.

Container concatenation isn't quadratic in Python at all.  The naive
implementation of sum() as a loop repeatedly calling __add__ is
quadratic for them.  Strings (and immutable containers in general) are
particularly horrible, as they don't have __iadd__.

You could argue that sum() being a function of an iterable isn't just
a calling convention for a loop encapsulated in a function, but rather
a completely different kind of function that doesn't imply anything
about the implementation, and therefore that it should dispatch on
type(it).  But explicitly dispatching on type(x) is yucky (what if
somebody wants to sum a different type not currently recognized by the
sum() builtin?) so, obviously, we should define a standard __sum__
dunder!  IMO we'd also want a homogeneous_iterable ABC, and a concrete
homogeneous_iterable_of_TYPE for each sum()-able TYPE to help users
catch bugs injecting the wrong type into an iterable_of_TYPE.

But this still sucks.  Why?  Because obviously we'd want the
attractive nuisance of "if you have __add__, there's a default
definition of __sum__" (AIUI, this is what bothers Alexander most
about the current situation, at least of the things he's mentioned, I
can really sympathize with his dislike).  And new Pythonistas and lazy
programmers who only intend to use sum() on "small enough" iterables
will use the default, and their programs will appear to hang on
somewhat larger iterable, or a realtime requirement will go
unsatisfied when least expected, or ....  If we *don't* have that
property for sum(), ugh!  Yuck!  Same old same old!  (IMHO, YMMV of
course)

It's possible that Python could provide some kind of feature that
would allow an optimized sum function for every type that has __add__,
but I think this will take a lot of thinking.  *Somebody* will do it
(I don't think anybody is +1 on restricting sum() to a subset of types
with __add__).  I just think we should wait until that somebody appears.

 > Should we cripple the performance of some operation in Cpython so that it
 > won't work better that Jython?

Nobody is crippling operations.  We're prohibiting use of a *name* for
an operation that is associated (strongly so, in my mind) with an
inefficient algorithm in favor of the *same operation* by a different
name (which has no existing implementation, and therefore Python
implementers are responsible for implementing it efficiently).  Note:
the "inefficient" algorithm isn't inefficient for integers, and it
isn't inefficient for numbers in general (although it's inaccurate for
some classes of numbers).

 > Seems the same argument [that Python language doesn't prohibit
 > optimizations in particular implementations just because they
 > aren't made in others] could be made for sum(list_of_strings).

It could.  But then we have to consider special-casing every builtin
type that provides __add__, and we impose an unobvious burden on user
types that provide __add__.

 > > It seems pretty pedantic to say: we could make this work well,
 > > but we'd rather chide you for not knowing the "proper" way to do
 > > it.

Nobody disagrees.  But backward compatibility gets in the way.

 > But sum() is not inherently quadratic -- that's a limitation of the
 > implementation.

But the faulty implementation is the canonical implementation, the
only one that can be defined directly in terms of __add__, and it is
efficient for non-container types.[1]

 > "".join _could_ be naively written with the same poor performance
 > -- why should users need to understand why one was optimized and
 > one was not?

Good question.  They shouldn't -- thus the prohibition on sum()ing
strings.

 > That is a very import a lesson to learn, sure, but python is not
 > only a teaching language. People will need to learn those lessons
 > at some point, this one feature makes little difference.

No, it makes a big difference.  If you can do something, then it's OK
to do it, is something Python tries to implement.  If sum() works for
everything with an __add__, given current Python language features
some people are going to end up with very inefficient code and it will
bite some of them (and not necessarily the authors!) at some time.

If it doesn't work for every type with __add__, why not?  You'll end
up playing whack-a-mole with type prohibitions.  Ugh.

 > Sure, but I think all that does is teach people about a cpython specific
 > implementation -- and I doubt naive users get any closer to understanding
 > algorithmic complexity -- all they learn is you should use string.join().
 > 
 > Oh well, not really that big a deal.

Not to Python.  Maybe not to you.  But I've learned a lot about
Pythonic ways of doing things trying to channel the folks who
implemented this restriction.  (I don't claim to have gotten it right!
Just that it's been fun and educational. :-)

Steve

Footnotes: 
[1]  This isn't quite true.  One can imagine a "token" or "symbol"
type that is str without __len__, but does have __add__.  But that
seems silly enough to not be a problem in practice.