On Mon, Aug 11, 2014 at 11:07 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
I'm referring to removing the unnecessary information that there's a better way to do it, and simply raising an error (as in Python 3.2, say) which is all a RealProgrammer[tm] should ever need!
I can't imagine anyone is suggesting that -- disallow it, but don't tell anyone why? The only thing that is remotely on the table here is: 1) remove the special case for strings -- buyer beware -- but consistent and less "ugly" 2) add a special case for strings that is fast and efficient -- may be as simple as calling "".join() under the hood --no more code than the exception check. And I doubt anyone really is pushing for anything but (2) Steven Turnbull wrote:
IMO we'd also want a homogeneous_iterable ABC
Actually, I've thought for years that that would open the door to a lot of optimizations -- but that's a much broader question that sum(). I even brought it up probably over ten years ago -- but no one was the least bit iinterested -- nor are they now -- I now this was a rhetorical suggestion to make the point about what not to do.... Because obviously we'd want the
attractive nuisance of "if you have __add__, there's a default definition of __sum__"
now I'm confused -- isn't that exactly what we have now? It's possible that Python could provide some kind of feature that
would allow an optimized sum function for every type that has __add__, but I think this will take a lot of thinking.
does it need to be every type? As it is the common ones work fine already except for strings -- so if we add an optimized string sum() then we're done. *Somebody* will do it
(I don't think anybody is +1 on restricting sum() to a subset of types with __add__).
uhm, that's exactly what we have now -- you can use sum() with anything that has an __add__, except strings. Ns by that logic, if we thought there were other inefficient use cases, we'd restrict those too. But users can always define their own classes that have a __sum__ and are really inefficient -- so unless sum() becomes just for a certain subset of built-in types -- does anyone want that? Then we are back to the current situation: sum() can be used for any type that has an __add__ defined. But naive users are likely to try it with strings, and that's bad, so we want to prevent that, and have a special case check for strings. What I fail to see is why it's better to raise an exception and point users to a better way, than to simply provide an optimization so that it's a mute issue. The only justification offered here is that will teach people that summing strings (and some other objects?) is order(N^2) and a bad idea. But: a) Python's primary purpose is practical, not pedagogical (not that it isn't great for that) b) I doubt any naive users learn anything other than "I can't use sum() for strings, I should use "".join()". Will they make the leap to "I shouldn't use string concatenation in a loop, either"? Oh, wait, you can use string concatenation in a loop -- that's been optimized. So will they learn: "some types of object shave poor performance with repeated concatenation and shouldn't be used with sum(). So If I write such a class, and want to sum them up, I'll need to write an optimized version of that code"? I submit that no naive user is going to get any closer to a proper understanding of algorithmic Order behavior from this small hint. Which leaves no reason to prefer an Exception to an optimization. One other point: perhaps this will lead a naive user into thinking -- "sum() raises an exception if I try to use it inefficiently, so it must be OK to use for anything that doesn't raise an exception" -- that would be a bad lesson to mis-learn.... -Chris PS: Armin Rigo wrote:
It also improves a lot the precision of sum(list_of_floats) (though not reaching the same precision levels of math.fsum()).
while we are at it, having the default sum() for floats be fsum() would be nice -- I'd rather the default was better accuracy loser performance. Folks that really care about performance could call math.fastsum(), or really, use numpy... This does turn sum() into a function that does type-based dispatch, but isn't python full of those already? do something special for the types you know about, call the generic dunder method for the rest. -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov