[Python-Dev] sum(...) limitation

Tue Aug 12 21:11:35 CEST 2014

On Mon, Aug 11, 2014 at 11:07 PM, Stephen J. Turnbull <stephen at xemacs.org>
wrote:

> I'm referring to removing the unnecessary information that there's a
>  better way to do it, and simply raising an error (as in Python 3.2,
> say) which is all a RealProgrammer[tm] should ever need!
>

I can't imagine anyone is suggesting that -- disallow it, but don't tell
anyone why?

The only thing that is remotely on the table here is:

1) remove the special case for strings -- buyer beware -- but consistent
and less "ugly"

2) add a special case for strings that is fast and efficient -- may be as
simple as calling "".join() under the hood --no more code than the
exception check.

And I doubt anyone really is pushing for anything but (2)

Steven Turnbull wrote:

>   IMO we'd also want a homogeneous_iterable ABC

Actually, I've thought for years that that would open the door to a lot of
optimizations -- but that's a much broader question that sum(). I even
brought it up probably over ten years ago -- but no one was the least bit
iinterested -- nor are they now -- I now this was a rhetorical suggestion
to make the point about what not to do....

  Because obviously we'd want the
> attractive nuisance of "if you have __add__, there's a default
> definition of __sum__"

now I'm confused -- isn't that exactly what we have now?

It's possible that Python could provide some kind of feature that
> would allow an optimized sum function for every type that has __add__,
> but I think this will take a lot of thinking.

does it need to be every type? As it is the common ones work fine already
except for strings -- so if we add an optimized string sum() then we're
done.

 *Somebody* will do it
> (I don't think anybody is +1 on restricting sum() to a subset of types
> with __add__).

uhm, that's exactly what we have now -- you can use sum() with anything
that has an __add__, except strings. Ns by that logic, if we thought there
were other inefficient use cases, we'd restrict those too.

But users can always define their own classes that have a __sum__ and are
really inefficient -- so unless sum() becomes just for a certain subset of
built-in types -- does anyone want that? Then we are back to the current
situation:

sum() can be used for any type that has an __add__ defined.

But naive users are likely to try it with strings, and that's bad, so we
want to prevent that, and have a special case check for strings.

What I fail to see is why it's better to raise an exception and point users
to a better way, than to simply provide an optimization so that it's a mute
issue.

The only justification offered here is that will teach people that summing
strings (and some other objects?) is order(N^2) and a bad idea. But:

a) Python's primary purpose is practical, not pedagogical (not that it
isn't great for that)

b) I doubt any naive users learn anything other than "I can't use sum() for
strings, I should use "".join()". Will they make the leap to "I shouldn't
use string concatenation in a loop, either"? Oh, wait, you can use string
concatenation in a loop -- that's been optimized. So will they learn: "some
types of object shave poor performance with repeated concatenation and
shouldn't be used with sum(). So If I write such a class, and want to sum
them up, I'll need to write an optimized version of that code"?

I submit that no naive user is going to get any closer to a proper
understanding of algorithmic Order behavior from this small hint. Which
leaves no reason to prefer an Exception to an optimization.

One other point: perhaps this will lead a naive user into thinking --
"sum() raises an exception if I try to use it inefficiently, so it must be
OK to use for anything that doesn't raise an exception" -- that would be a
bad lesson to mis-learn....

-Chris

PS:
Armin Rigo wrote:

> It also improves a
> lot the precision of sum(list_of_floats) (though not reaching the same
> precision levels of math.fsum()).

while we are at it, having the default sum() for floats be fsum() would be
nice -- I'd rather the default was better accuracy loser performance. Folks
that really care about performance could call math.fastsum(), or really,
use numpy...

This does turn sum() into a function that does type-based dispatch, but
isn't python full of those already? do something special for the types you
know about, call the generic dunder method for the rest.

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140812/8779b69b/attachment.html>