
On Tue, Aug 12, 2014 at 11:21 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Redirecting to python-ideas, so trimming less than I might.
reasonable enough -- you are introducing some more significant ideas for changes. I've said all I have to say about this -- I don't seem to see anything encouraging form core devs, so I guess that's it. Thanks for the fun bike-shedding... -Chris
Chris Barker writes:
On Mon, Aug 11, 2014 at 11:07 PM, Stephen J. Turnbull < stephen@xemacs.org> wrote:
I'm referring to removing the unnecessary information that there's a better way to do it, and simply raising an error (as in Python 3.2, say) which is all a RealProgrammer[tm] should ever need!
I can't imagine anyone is suggesting that -- disallow it, but don't tell anyone why?
As I said, it's a regression. That's exactly the behavior in Python 3.2.
The only thing that is remotely on the table here is:
1) remove the special case for strings -- buyer beware -- but consistent and less "ugly"
It's only consistent if you believe that Python has strict rules for use of various operators. It doesn't, except as far as they are constrained by precedence. For example, I have an application where I add bytestrings bytewise modulo N <= 256, and concatenate them. In fact I use function call syntax, but the obvious operator syntax is '+' for the bytewise addition, and '*' for the concatenation.
It's not in the Zen, but I believe in the maxim "If it's worth doing, it's worth doing well." So for me, 1) is out anyway.
2) add a special case for strings that is fast and efficient -- may be as simple as calling "".join() under the hood --no more code than the exception check.
Sure, but what about all the other immutable containers with __add__ methods? What about mappings with key-wise __add__ methods whose values might be immutable but have __add__ methods? Where do you stop with the special-casing? I consider this far more complex and ugly than the simple "sum() is for numbers" rule (and even that is way too complex considering accuracy of summing floats).
And I doubt anyone really is pushing for anything but (2)
I know that, but I think it's the wrong solution to the problem (which is genuine IMO). The right solution is something generic, possibly a __sum__ method. The question is whether that leads to too much work to be worth it (eg, "homogeneous_iterable").
Because obviously we'd want the attractive nuisance of "if you have __add__, there's a default definition of __sum__"
now I'm confused -- isn't that exactly what we have now?
Yes and my feeling (backed up by arguments that I admit may persuade nobody but myself) is that what we have now kinda sucks[tm]. It seemed like a good idea when I first saw it, but then, my apps don't scale to where the pain starts in my own usage.
It's possible that Python could provide some kind of feature that would allow an optimized sum function for every type that has __add__, but I think this will take a lot of thinking.
does it need to be every type? As it is the common ones work fine already except for strings -- so if we add an optimized string sum() then we're done.
I didn't say provide an optimized sum(), I said provide a feature enabling people who want to optimize sum() to do so. So yes, it needs to be every type (the optional __sum__ method is a proof of concept, modulo it actually being implementable ;-).
*Somebody* will do it (I don't think anybody is +1 on restricting sum() to a subset of types with __add__).
uhm, that's exactly what we have now
Exactly. Who's arguing that the sum() we have now is a ticket to Paradise? I'm just saying that there's probably somebody out there negative enough on the current situation to come up with an answer that I think is general enough (and I suspect that python-dev consensus is that demanding, too).
sum() can be used for any type that has an __add__ defined.
I'd like to see that be mutable types with __iadd__.
What I fail to see is why it's better to raise an exception and point users to a better way, than to simply provide an optimization so that it's a mute issue.
Because inefficient sum() is an attractive nuisance, easy to overlook, and likely to bite users other than the author.
The only justification offered here is that will teach people that summing strings (and some other objects?)
Summing tuples works (with appropriate start=tuple()). Haven't benchmarked, but I bet that's O(N^2).
is order(N^2) and a bad idea. But:
a) Python's primary purpose is practical, not pedagogical (not that it isn't great for that)
My argument is that in practical use sum() is a bad idea, period, until you book up on the types and applications where it *does* work. N.B. It doesn't even work properly for numbers (inaccurate for floats).
b) I doubt any naive users learn anything other than "I can't use sum() for strings, I should use "".join()".
For people who think that special-casing strings is a good idea, I think this is about as much benefit as you can expect. Why go farther?<0.5 wink/>
I submit that no naive user is going to get any closer to a proper understanding of algorithmic Order behavior from this small hint. Which leaves no reason to prefer an Exception to an optimization.
TOOWTDI. str.join is in pretty much every code base by now, and tutorials and FAQs recommending its user and severely deprecating sum for strings are legion.
One other point: perhaps this will lead a naive user into thinking -- "sum() raises an exception if I try to use it inefficiently, so it must be OK to use for anything that doesn't raise an exception" -- that would be a bad lesson to mis-learn....
That assumes they know about the start argument. I think most naive users will just try to sum a bunch of tuples, and get the "can't add 0, tuple" Exception and write a loop. I suspect that many of the users who get the "use str.join" warning along with the Exception are unaware of the start argument, too. They expect sum(iter_of_str) to magically add the strings. Ie, when in 3.2 they got the uninformative "can't add 0, str" message, they did not immediately go "d'oh" and insert ", start=''" in the call to sum, they wrote a loop.
while we are at it, having the default sum() for floats be fsum() would be nice
How do you propose to implement that, given math.fsum is perfectly happy to sum integers? You can't just check one or a few leading elements for floatiness. I think you have to dispatch on type(start), but then sum(iter_of_floats) DTWT. So I would suggest changing the signature to sum(it, start=0.0). This would probably be acceptable to most users with iterables of ints, but does imply some performance hit.
This does turn sum() into a function that does type-based dispatch, but isn't python full of those already? do something special for the types you know about, call the generic dunder method for the rest.
AFAIK Python is moving in the opposite direction: if there's a common need for dispatching to type-specific implementations of a method, define a standard (not "generic") dunder for the purpose, and have the builtin (or operator, or whatever) look up (not "call") the appropriate instance in the usual way, then call it. If there's a useful generic implementation, define an ABC to inherit from that provides that generic implementation.
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov