Re: [Python-Dev] sum(...) limitation

Aug. 13, 2014

      On Tue, Aug 12, 2014 at 11:21 PM, Stephen J. Turnbull <stephen@xemacs.org>
wrote:
...
Redirecting to python-ideas, so trimming less than I might.
reasonable enough -- you are introducing some more significant ideas for
changes.

I've said all I have to say about this -- I don't seem to see anything
encouraging form core devs, so I guess that's it.

Thanks for the fun bike-shedding...

-Chris
...
Chris Barker writes:
...
On Mon, Aug 11, 2014 at 11:07 PM, Stephen J. Turnbull <
stephen@xemacs.org>
wrote:
...
I'm referring to removing the unnecessary information that there's a
 better way to do it, and simply raising an error (as in Python 3.2,
say) which is all a RealProgrammer[tm] should ever need!
I can't imagine anyone is suggesting that -- disallow it, but don't tell
anyone why?
As I said, it's a regression.  That's exactly the behavior in Python 3.2.
...
The only thing that is remotely on the table here is:
1) remove the special case for strings -- buyer beware -- but consistent
and less "ugly"
It's only consistent if you believe that Python has strict rules for
use of various operators.  It doesn't, except as far as they are
constrained by precedence.  For example, I have an application where I
add bytestrings bytewise modulo N <= 256, and concatenate them.  In
fact I use function call syntax, but the obvious operator syntax is
'+' for the bytewise addition, and '*' for the concatenation.
It's not in the Zen, but I believe in the maxim "If it's worth doing,
it's worth doing well."  So for me, 1) is out anyway.
...
2) add a special case for strings that is fast and efficient -- may be
as
simple as calling "".join() under the hood --no more code than the
exception check.
Sure, but what about all the other immutable containers with __add__
methods?  What about mappings with key-wise __add__ methods whose
values might be immutable but have __add__ methods?  Where do you stop
with the special-casing?  I consider this far more complex and ugly
than the simple "sum() is for numbers" rule (and even that is way too
complex considering accuracy of summing floats).
...
And I doubt anyone really is pushing for anything but (2)
I know that, but I think it's the wrong solution to the problem (which
is genuine IMO).  The right solution is something generic, possibly a
__sum__ method.  The question is whether that leads to too much work
to be worth it (eg, "homogeneous_iterable").
...
...
Because obviously we'd want the attractive nuisance of "if you
have __add__, there's a default definition of __sum__"
now I'm confused -- isn't that exactly what we have now?
Yes and my feeling (backed up by arguments that I admit may persuade
nobody but myself) is that what we have now kinda sucks[tm].  It
seemed like a good idea when I first saw it, but then, my apps don't
scale to where the pain starts in my own usage.
...
...
It's possible that Python could provide some kind of feature that
would allow an optimized sum function for every type that has
__add__, but I think this will take a lot of thinking.
does it need to be every type? As it is the common ones work fine
already
except for strings -- so if we add an optimized string sum() then we're
done.
I didn't say provide an optimized sum(), I said provide a feature
enabling people who want to optimize sum() to do so.  So yes, it needs
to be every type (the optional __sum__ method is a proof of concept,
modulo it actually being implementable ;-).
...
...
*Somebody* will do it (I don't think anybody is +1 on restricting
sum() to a subset of types with __add__).
uhm, that's exactly what we have now
Exactly.  Who's arguing that the sum() we have now is a ticket to
Paradise?  I'm just saying that there's probably somebody out there
negative enough on the current situation to come up with an answer
that I think is general enough (and I suspect that python-dev
consensus is that demanding, too).
...
sum() can be used for any type that has an __add__ defined.
I'd like to see that be mutable types with __iadd__.
...
What I fail to see is why it's better to raise an exception and
point users to a better way, than to simply provide an optimization
so that it's a mute issue.
Because inefficient sum() is an attractive nuisance, easy to overlook,
and likely to bite users other than the author.
...
The only justification offered here is that will teach people that
summing
strings (and some other objects?)
Summing tuples works (with appropriate start=tuple()).  Haven't
benchmarked, but I bet that's O(N^2).
...
is order(N^2) and a bad idea. But:
a) Python's primary purpose is practical, not pedagogical (not that it
isn't great for that)
My argument is that in practical use sum() is a bad idea, period,
until you book up on the types and applications where it *does* work.
N.B. It doesn't even work properly for numbers (inaccurate for floats).
...
b) I doubt any naive users learn anything other than "I can't use sum()
for
strings, I should use "".join()".
For people who think that special-casing strings is a good idea, I
think this is about as much benefit as you can expect.  Why go
farther?<0.5 wink/>
...
I submit that no naive user is going to get any closer to a proper
understanding of algorithmic Order behavior from this small hint. Which
leaves no reason to prefer an Exception to an optimization.
TOOWTDI.  str.join is in pretty much every code base by now, and
tutorials and FAQs recommending its user and severely deprecating sum
for strings are legion.
...
One other point: perhaps this will lead a naive user into thinking --
"sum() raises an exception if I try to use it inefficiently, so it must
be
OK to use for anything that doesn't raise an exception" -- that would
be a
bad lesson to mis-learn....
That assumes they know about the start argument.  I think most naive
users will just try to sum a bunch of tuples, and get the "can't add
0, tuple" Exception and write a loop.  I suspect that many of the
users who get the "use str.join" warning along with the Exception are
unaware of the start argument, too.  They expect sum(iter_of_str) to
magically add the strings.  Ie, when in 3.2 they got the
uninformative "can't add 0, str" message, they did not immediately go
"d'oh" and insert ", start=''" in the call to sum, they wrote a loop.
...
while we are at it, having the default sum() for floats be fsum()
would be nice
How do you propose to implement that, given math.fsum is perfectly
happy to sum integers?  You can't just check one or a few leading
elements for floatiness.  I think you have to dispatch on type(start),
but then sum(iter_of_floats) DTWT.  So I would suggest changing the
signature to sum(it, start=0.0).  This would probably be acceptable to
most users with iterables of ints, but does imply some performance hit.
...
This does turn sum() into a function that does type-based dispatch,
but isn't python full of those already? do something special for
the types you know about, call the generic dunder method for the
rest.
AFAIK Python is moving in the opposite direction: if there's a common
need for dispatching to type-specific implementations of a method,
define a standard (not "generic") dunder for the purpose, and have the
builtin (or operator, or whatever) look up (not "call") the
appropriate instance in the usual way, then call it.  If there's a
useful generic implementation, define an ABC to inherit from that
provides that generic implementation.
-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@noaa.gov