[Python-Dev] sum(...) limitation

Wed Aug 13 08:21:42 CEST 2014

Redirecting to python-ideas, so trimming less than I might.

Chris Barker writes:
 > On Mon, Aug 11, 2014 at 11:07 PM, Stephen J. Turnbull <stephen at xemacs.org>
 > wrote:
 > 
 > > I'm referring to removing the unnecessary information that there's a
 > >  better way to do it, and simply raising an error (as in Python 3.2,
 > > say) which is all a RealProgrammer[tm] should ever need!
 > >
 > 
 > I can't imagine anyone is suggesting that -- disallow it, but don't tell
 > anyone why?

As I said, it's a regression.  That's exactly the behavior in Python 3.2.

 > The only thing that is remotely on the table here is:
 > 
 > 1) remove the special case for strings -- buyer beware -- but consistent
 > and less "ugly"

It's only consistent if you believe that Python has strict rules for
use of various operators.  It doesn't, except as far as they are
constrained by precedence.  For example, I have an application where I
add bytestrings bytewise modulo N <= 256, and concatenate them.  In
fact I use function call syntax, but the obvious operator syntax is
'+' for the bytewise addition, and '*' for the concatenation.

It's not in the Zen, but I believe in the maxim "If it's worth doing,
it's worth doing well."  So for me, 1) is out anyway.

 > 2) add a special case for strings that is fast and efficient -- may be as
 > simple as calling "".join() under the hood --no more code than the
 > exception check.

Sure, but what about all the other immutable containers with __add__
methods?  What about mappings with key-wise __add__ methods whose
values might be immutable but have __add__ methods?  Where do you stop
with the special-casing?  I consider this far more complex and ugly
than the simple "sum() is for numbers" rule (and even that is way too
complex considering accuracy of summing floats).

 > And I doubt anyone really is pushing for anything but (2)

I know that, but I think it's the wrong solution to the problem (which
is genuine IMO).  The right solution is something generic, possibly a
__sum__ method.  The question is whether that leads to too much work
to be worth it (eg, "homogeneous_iterable").

 > > Because obviously we'd want the attractive nuisance of "if you
 > > have __add__, there's a default definition of __sum__"
 > 
 > now I'm confused -- isn't that exactly what we have now?

Yes and my feeling (backed up by arguments that I admit may persuade
nobody but myself) is that what we have now kinda sucks[tm].  It
seemed like a good idea when I first saw it, but then, my apps don't
scale to where the pain starts in my own usage.

 > > It's possible that Python could provide some kind of feature that
 > > would allow an optimized sum function for every type that has
 > > __add__, but I think this will take a lot of thinking.
 > 
 > does it need to be every type? As it is the common ones work fine already
 > except for strings -- so if we add an optimized string sum() then we're
 > done.

I didn't say provide an optimized sum(), I said provide a feature
enabling people who want to optimize sum() to do so.  So yes, it needs
to be every type (the optional __sum__ method is a proof of concept,
modulo it actually being implementable ;-).

 > > *Somebody* will do it (I don't think anybody is +1 on restricting
 > > sum() to a subset of types with __add__).
 > 
 > uhm, that's exactly what we have now

Exactly.  Who's arguing that the sum() we have now is a ticket to
Paradise?  I'm just saying that there's probably somebody out there
negative enough on the current situation to come up with an answer
that I think is general enough (and I suspect that python-dev
consensus is that demanding, too).

 > sum() can be used for any type that has an __add__ defined.

I'd like to see that be mutable types with __iadd__.

 > What I fail to see is why it's better to raise an exception and
 > point users to a better way, than to simply provide an optimization
 > so that it's a mute issue.

Because inefficient sum() is an attractive nuisance, easy to overlook,
and likely to bite users other than the author.

 > The only justification offered here is that will teach people that summing
 > strings (and some other objects?)

Summing tuples works (with appropriate start=tuple()).  Haven't
benchmarked, but I bet that's O(N^2).

 > is order(N^2) and a bad idea. But:
 > 
 > a) Python's primary purpose is practical, not pedagogical (not that it
 > isn't great for that)

My argument is that in practical use sum() is a bad idea, period,
until you book up on the types and applications where it *does* work.
N.B. It doesn't even work properly for numbers (inaccurate for floats).

 > b) I doubt any naive users learn anything other than "I can't use sum() for
 > strings, I should use "".join()".

For people who think that special-casing strings is a good idea, I
think this is about as much benefit as you can expect.  Why go
farther?<0.5 wink/>

 > I submit that no naive user is going to get any closer to a proper
 > understanding of algorithmic Order behavior from this small hint. Which
 > leaves no reason to prefer an Exception to an optimization.

TOOWTDI.  str.join is in pretty much every code base by now, and
tutorials and FAQs recommending its user and severely deprecating sum
for strings are legion.

 > One other point: perhaps this will lead a naive user into thinking --
 > "sum() raises an exception if I try to use it inefficiently, so it must be
 > OK to use for anything that doesn't raise an exception" -- that would be a
 > bad lesson to mis-learn....

That assumes they know about the start argument.  I think most naive
users will just try to sum a bunch of tuples, and get the "can't add
0, tuple" Exception and write a loop.  I suspect that many of the
users who get the "use str.join" warning along with the Exception are
unaware of the start argument, too.  They expect sum(iter_of_str) to
magically add the strings.  Ie, when in 3.2 they got the
uninformative "can't add 0, str" message, they did not immediately go
"d'oh" and insert ", start=''" in the call to sum, they wrote a loop.

 > while we are at it, having the default sum() for floats be fsum()
 > would be nice

How do you propose to implement that, given math.fsum is perfectly
happy to sum integers?  You can't just check one or a few leading
elements for floatiness.  I think you have to dispatch on type(start),
but then sum(iter_of_floats) DTWT.  So I would suggest changing the
signature to sum(it, start=0.0).  This would probably be acceptable to
most users with iterables of ints, but does imply some performance hit.

 > This does turn sum() into a function that does type-based dispatch,
 > but isn't python full of those already? do something special for
 > the types you know about, call the generic dunder method for the
 > rest.

AFAIK Python is moving in the opposite direction: if there's a common
need for dispatching to type-specific implementations of a method,
define a standard (not "generic") dunder for the purpose, and have the
builtin (or operator, or whatever) look up (not "call") the
appropriate instance in the usual way, then call it.  If there's a
useful generic implementation, define an ABC to inherit from that
provides that generic implementation.