Break the dominance of boolean values in boolean context

True
Hi, the proposal is to reduce the number of occurences where Python automatically and inevitably uses pure boolean objects (bool()), instead of relying on rich object behaviour. My goal is to give objects finer control over how they want to be interpreted in a boolean context. This reaches down as far as implementing the boolean operations as an object-protocol. In Python, no kind of object is in any way special. Instead of relying on what an object actually IS, we rely on what an object can DO. This approach of anti-discrimination is the source of great confidence about the language and it's aversion to black magic. The bool()-type however is a primus inter pares: In numerous locations, Python strongly favors this type of object either explicitly by casting to it or implicitly through the compiled code. Here are a three examples (protocol-, object- and code-based) where Python grinds a boolean context down to pure boolean values: In protocol behaviour: The documentation about how to implement __contains__ vaguely states that the function "should return true, false otherwise". In fact we may return any object that can be interpreted as a boolean, this is intented behaviour. However the implementation of COMPARE_OP in Python/ceval.c inevitably casts the result of "x in y" to Py_True or Py_False, throwing away our object and leaving us with this bool only. This kills any chance of producing useful objects in __contains__, even objects that may have a meaning outside the pure boolean context. For example: x = mycontainer(['foo', 'bar', 'bar', 'foo', 'foo']) y = 'foo' in mycontainer print y == True print y
mycontainer('foo', 'foo', 'foo')
True
This has been discussed on python-ideas in mid-2010 (http://mail.python.org/pipermail/python-ideas/2010-July/007733.html) without distinct outcome. In object behaviour: Rich Comparision allows numeric- or set-like comparision in any way we like, returing any kind of object. The documentation clearly states that the returned object is only(!) interpreted as a boolean when used in a boolean context (__contains__ falls short here). For example: x = set((1,2,3,4,5)) y = set((2,3,4)) z = x > y print z == True print z
set([1,5]) # equivalent to x - y
False
z = x < y print z == True print z
set([]) # equivalent to y - x
1
print 3 > 2 print (3 > 2) == True
True
In code behaviour: Why is it, that we can override the behaviour of arithmetic operations like "+, -, *, /", but not the boolean operations like "and, or, xor, not" ? Why do we force any value being generated in a boolean context like "if x == 1 or y == 1" to actually be of boolean type? The result of "(x/y) == 1" may be any kind of object coming from __eq__. However the "or"-operator here grinds them both down to being just True or False. This is due to the fact that the generated bytecode evaluates one expression at a time and does not keep track of the objects it got while doing so. The pure boolean behaviour arises from the use of JUMP_IF_FALSE/TRUE after each comparision. Instead of having boolean operations being a part of the language, we could implement them as a an extension to the Rich Comparision Protocol, giving rise to functions like object.__bor__, object.__bnot__ or object.__band__: The expression "if x == 1 or y == 1" would then become equivalent to tmp1 = x.__eq__(1) if tmp1: return tmp1 tmp2 = y.__eq__(1) if tmp2: return tmp2 return = tmp1.__bor__(tmp2) Likewise, the expression "if x == 1 and y == 1" would become tmp1 = x.__eq__(1) if not tmp1: return tmp1 tmp2 = y.__eq__(1) if not tmp2: return tmp2 return tmp1.__band__(tmp2) The object-example from above now tells us how boolean behaviour and arithmetic behaviour go hand in hand: "(setA > setB) or (setA < setB)" is True because "set([1,5]).__bor__(set([]))" is the same as "set([1,5]) + set([])" and equivalent to True in a boolean context. Likewise, "(setA > setB) and (setA < setB)" is False because "set([1,5]).__band__set([])" is just "set([])". It follows that "(setA
setB) == (setA - setB) == (setA & setB)". We can't do this with boolean operations being part of the language only.
Summing all up, I really think that we should break the dominance of bool() and take a look at how we can implement boolean contexts without relying on boolean values all the time. None of this can be implemented without breaking at least the CPython API. For example, the behaviour of __contains__ can't be changed in the proposed way without changing the signature of "int PySequence_Contains()" to "PyObject* PySequence_Contains()".

On Mon, Sep 12, 2011 at 4:20 PM, Lukas Lueg <lukas.lueg@googlemail.com> wrote: ..
You may want to take a look at PEP 335: http://www.python.org/dev/peps/pep-0335/

On Mon, Sep 12, 2011 at 4:20 PM, Lukas Lueg <lukas.lueg@googlemail.com> wrote:
Your basic idea seems like a good one to me at an abstract level -- methods like __contains__ have no good design reason to typecheck. However, I don't think any of the concrete changes here serve to make Python a nicer or saner language. There is no reason why set should be changed either of the ways you discuss, and there is no excuse I can think of to implement something that works like mycontainer. Mike

On 9/12/2011 4:20 PM, Lukas Lueg wrote:
However the "or"-operator here grinds them both down to being just True or False.
Both 'and' and 'or' return one of their inputs unchanged and unground.
If both inputs are True or False, then or course you always get one of them back. But again, no 'grind'ing.
In Python, 'and' and 'or' are *not* (functional) boolean operations. They are flow-control keywords that abbreviate in expression form an if-else statement pattern. They are compiled to similar bytecode. def f1(a,b): return a and b def f2(a,b): if not a: return a else: return b from dis import dis dis(f1) dis(f2) ### 0 LOAD_FAST 0 (a) 3 JUMP_IF_FALSE_OR_POP 9 6 LOAD_FAST 1 (b) 9 RETURN_VALUE 0 LOAD_FAST 0 (a) 3 POP_JUMP_IF_TRUE 10 6 LOAD_FAST 0 (a) 9 RETURN_VALUE 10 LOAD_FAST 1 (b) 13 RETURN_VALUE The difference is that the compiler does not notice that the 'a' to be returned is the same 'a' that it just tested and which is still sitting on the stack, so it reloads it or, if is were an expression, would recalculate it. There is no internal function that *could* be overloaded. In this sense, they are similar to 'lambda expressions, which abbreviate a simple def statement pattern and which compile to essentially the same bytecode as the def statement. The xor bitwise int operator '^' can be overloaded: __xor__ and __rxor__. Logical xor can be implemented as bool(a)^bool(b). Before there was a bool class, 'not not a' was the direct way to coerce a to a boolean value. Anything that could be done with a hidden .__not__ method could also be done, and I would say, better done, with an overt .invert method. For instance, a mutable binary array class should likely have .invert to flip each member in-place. As it is now, anyone reading 'not a' can depend on the result being a bool. And I think that is good. -- Terry Jan Reedy

Eric Snow wrote:
So you want "give me the item 'foo' in mycontainer" rather than "is item 'foo' in mycontainer"? Isn't that what __getitem__ does already?
The canonical use case for this kind of thing is numpy arrays. To be consistent with the numpy philosophy, 'a in b' where b is an array should give you an array of values indicating whether a is in each element of b. -- Greg

Lukas Lueg wrote:
This scheme would be of limited usefulness. It still assumes that the operands *can* be treated as boolean values, which is not always the case -- numpy arrays can't, for example. Also, the short-circuiting behaviour is still defined by their boolean values. I suggested a more complete way of addressing this in PEP 335. -- Greg

Philosophically, the idea that boolean operations like x in y should be able to return any value for True makes sense. The details leave much to be desired. It wouldn't make sense for it to return x since 0 in [0,1] and '' in 'string' should both return a true value and 0 and '' are false. On Mon, Sep 12, 2011 at 1:20 PM, Lukas Lueg <lukas.lueg@googlemail.com>wrote:
You've lost me where you suggest changing the behavior of == True. And where you propose that x > y should return x - y. You also don't mention what x < y should return. The real problem is the lack of a compelling scenario why you'd want to make this change. Aside from that, checking to see if one set is contained in another does not require computation of the full difference. Why should very_big_set > (1,2) go to the overhead of constructing a still very big set that's going to be immediately discarded?
I don't know the scenario where 3 > 2 returning 1 is useful. Would 2 < 3 also return 1? The Icon programming language (and others) has 2 < 3 return 3. This makes expressions like 2 < 3 < 4 work. And in fact python and/or operators work in exactly the same way returning the value that determines the answer (2 and 3) equals 3. Icon can do this because it has a special result 'fail' which blocks further evaluation, roughly equivalent to 1 < 0 throwing an exception. It wouldn't work in Python for lots of reasons.
So what? This isn't a general principle or anything. All you've done is say that if you define set<set to return non-bool values and __band__ and __bor__ preserve the invariants bool(x.__band__(y)) is bool(x and y) bool(x.__bor__(y)) is bool(x or y) This doesn't prove that "(setA > setB) or (setA < setB)" is always true or anything (because it isn't of course).
We can't do this with boolean operations being part of the language only.
Nor do we need to. Figuring out the value of "(setA > setB) or (setA < setB)" is much more less convoluted than that. --- Bruce Follow me: http://www.twitter.com/Vroo http://www.vroospeak.com

Can we not allow things like `a < b` to return non-boolean values, without altering the behaviour of existing Python types? There _are_ use cases (numpy, ORMs, DSLs, and numpy). These don't really fit the bill. Devin On Mon, Sep 12, 2011 at 4:20 PM, Lukas Lueg <lukas.lueg@googlemail.com> wrote:

On Mon, 2011-09-12 at 21:40 -0400, Devin Jeanpierre wrote:
Can we not allow things like `a < b` to return non-boolean values, without altering the behaviour of existing Python types?
Would that return 'a' or 'b', or something else?

Would that return 'a' or 'b', or something else?
Usually it'd return True or False. Same with things like `foo in bar`. I just mean that we can extend these to return other objects without insisting that builtin types (floats, sets, etc.) do weird things with them. Devin On Mon, Sep 12, 2011 at 10:13 PM, Ron Adam <ron3200@gmail.com> wrote:

On 13 September 2011 16:05, Lukas Lueg <lukas.lueg@googlemail.com> wrote:
Maybe I'm being dense, but why would I read it like that? The operation a < b means "is a less than b?" That's not a matter of opinion, as far as I can see, that's basically the definition of "<". You could certainly overload < to mean something else for particular types, but that's not something to do lightly (look at C++'s use of << for IO, for a relatively benign example of both the benefits and problems with such an approach...) Paul.

On Tue, Sep 13, 2011 at 8:05 AM, Lukas Lueg <lukas.lueg@googlemail.com>wrote:
Then what would a <= b or a == b or a != b return? This idea of returning 'how far apart are these objects' doesn't generalize for integers much less more complex types. You can't design a new feature by giving one example ... and no use cases. --- Bruce Follow me: http://www.twitter.com/Vroo http://www.vroospeak.com

On Tue, Sep 13, 2011 at 11:40 AM, Devin Jeanpierre <jeanpierreda@gmail.com> wrote:
Can we not allow things like `a < b` to return non-boolean values, without altering the behaviour of existing Python types?
We already do. As far as I am aware, the only holdouts are: x in a (coerces to bool) not a (coerces to bool) a and b (doesn't coerce as such, but roughly equivalent to "a if not a else b") a or b (doesn't coerce as such, but roughly equivalent to "a if a else b") The first case can already be overridden (via __contains__) and I don't believe there's any specific reason for retaining the coercion to bool (aside from nobody writing and championing a patch that eliminates the restriction). PEP 207 (which added rich comparisons in the first place) is silent on the matter - it only talks about the comparison and ordering operations. The latter 3 cases are discussed in depth in PEP 335 (Overloadable Boolean Operators). Again, I don't believe there's anything fundamentally wrong with the idea, but it requires both an up to date reference implementation and a champion willing to gather use cases, assess the performance impact and generally argue in its favour. We don't have the numpy folks knocking down our door asking for distributable versions of these operations, after all. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Sep 12, 2011 at 8:39 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
But does the C code wrapping it preserve the non-bool value? (IIRC originally many of these were constrained by the C implementation more than by any philosophical desire to keep bool pure.)
Oh I see, you are saying that there is no need *in principle*. Agreed for __contains__. (BTW there is also a need for the reverse operator. Maybe it could be called __in__?)
It wasn't a priority at the time.
I'm skeptical. It would mean that the compiler no longer has the option to translate "if not" into a reversal of the control flow; it must generate code to execute the "not" operator. That seems like a pessimization. Here's some disassembler output to prove it:
It looks like the same argument cannot be made for 'and' / 'or', so perhaps those are fine -- though I also expect that there aren't many use cases. (For all these, I suspect the most common use case is actually symbolic algebra packages which want to overload operators so they can return a parse tree instead of an outcome. This works for regular operators and for comparisons, but not for in, not, and, or. But I'm not sure I want to pay the price for this flexibility everywhere where those operators are used in their traditional meanings. Admittedly I haven't read PEP 335 thoroughly. -- --Guido van Rossum (python.org/~guido)

On Tue, Sep 13, 2011 at 2:09 PM, Guido van Rossum <guido@python.org> wrote:
The two issues seem somewhat orthogonal to me, but yes, the general idea would be to make 'in' behave more like a rich comparison operator rather than an explicitly boolean operation as it does now. It occurs to me that adding __in__ could also address a slight performance oddity with 3.2 range objects: the __contains__ check in 3.2 has an O(1) fast path for containment checks on actual Python integers, but falls back to the O(n) sequential search for objects that only implement __index__(). If __in__() was available, such objects could conceivably map containment tests to checks against the corresponding real Python integer (although doing that carelessly would do weird things to other containers, such as identity-keyed dictionaries. That would be a quality of implementation issue on __in__ methods, though).
I think that's a pretty good explanation for why PEP 335 hasn't been pushed too hard - there's a lot of value in having not/and/or behave more like control flow structures rather than ordinary operations. PEP 335 does show that it's theoretically possible to make them behave more like operators, but that doesn't make it a good idea. To be honest, I don't think anyone would cry too much if you decided to explicitly reject it on the basis of continuing to allow control flow optimisations for code involving not/and/or. While CPython doesn't do it, I believe there *are* control flow transformations that the current semantics permit that PEP 335 would disallow, such as automatically applying De Morgan's Law (I don't actually have a use case for doing that, I'm just mentioning it as a consequence of the semantics change proposed by the PEP). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

The question is if we want to treat bools more special than any other type. Right now bools are the only place where this is done. Do we really want to specialize the language in order to favour control flow in the implementation? Forever? That's a bold statement.
Performance is always an argument. Python as a language, and most especially CPython itself, has a history of favouring design over application; the same should be true for bools. Concerning De Morgan's Law, I think in Python this should be treated as a convention in the implementation, not a rule of the language. After all, we also have [1,2,3] + [4,5,6] = [1,2,3,4,5,6] not by rule but by simple reasoning. Also: If an object does not override a certain boolean operation, the compiler can still take advantage of optimisation. Progress on the AST may pave the road for that.

On Tue, Sep 13, 2011 at 7:01 AM, Lukas Lueg <lukas.lueg@googlemail.com> wrote:
I have no problem with this statement and don't see it as particularly bold or controversial. Control flow just *is* special. FWIW It is not bool values that are being treated special; it is certain bool operators (not, and, or). I see nothing wrong with that; you can think of them as an extension of the repertoire of 'if', 'while' etc.
De Morgan's law (and similar transformation) seem to me out of scope. The would violate gut feelings about the "natural" execution order of Python expressions.
No, because the compiler does not have the runtime types of the object available. If you don't know how Python really works you shouldn't be making language change proposals. -- --Guido van Rossum (python.org/~guido)

Guido van Rossum wrote:
Seems to me they're something of a mixture of computation and control flow. The computation consists of calling the __nonzero__ methods of the operands to find out whether they should be considered true or false. PEP 335 doesn't change that, it just shifts the emphasis slightly. It provides hooks allowing you to do more computation before deciding on the control flow.
I don't see how. If e.g. you transform if not (a and b): into if not a or not b: then the evaluation order doesn't change -- you still evaluate a first, and then evaluate b if needed. It's just an extension of the existing rewriting of 'not' deeper into the expression. -- Greg

On Tue, Sep 13, 2011 at 5:46 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Yeah, and the extra computation (however slight) bothers me.
Fair enough, especially since the bytecode compiler already gladly interchanges JUMP_IF_FALSE and JUMP_IF_TRUE (swapping the targets). -- --Guido van Rossum (python.org/~guido)

On Fri, Sep 16, 2011 at 10:39 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
abstract.c says "Hi!". The memory layout of type objects means the dance to handle binary operators correctly is generally a little more complicated than that. The short circuiting support in PEP 335 will only make it worse, since there would be two slots to check on each operand. In this particular case though, the additional problem is that the interpreter currently treats 'and' and 'or' as flow control structures, not operators: # python 3.2
That means making it possible to overload them is actually a fairly deep change to the expression semantics as far as the compiler is concerned. The control flow would need to be something like: LOAD_NAME x JUMP_IF_LOGICAL_AND <target> (<-- handle shortcircuiting cases) LOAD_NAME y LOGICAL_AND (<-- handle binary version, potentially doing stack twiddling to ditch LH operand if there are no slots to call) <target> RETURN_VALUE That's why my suggestion to consider '&&' and '||' as alternate, lower precedence, spellings of '&' and '|' wasn't entirely in jest. Their overloading semantics could be identical to the existing bitwise operators, but their precedence would be just above that of the control flow operations. Having operators that are equivalent aside from subtle details of precedence would be odd, but it strikes me as being less problematic overall than messing with the control flow semantics of 'and' and 'or'. Reusing the '&' and '|' typeslots would also mean that existing libraries like SQLAlchemy that already use '&' and '|' for boolean logic would just work with the new operators. C/C++ programmers would get rather confused when they found out that '&&' and '||' on numeric types still resulted in bitwise operations, though. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Sep 16, 2011 at 11:53 AM, Guido van Rossum <guido@python.org> wrote:
But didn't Greg have a working patch? (For Python 2.3, but still...)
It isn't that I think the code changes are going to be huge, just that even in the basic case they're not as trivial as "reduces to noticing that a particular type slot is NULL at the C level" makes them sound. The shift from 'control flow operation' to 'binary operator' is fairly substantial, even if the changes are isolated to a couple of new opcodes and the code generation for a couple of AST nodes. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Sep 15, 2011 at 8:14 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Yeah, I'd like to see the new generated code. If it's even a single opcode longer than the old code, that's a big cost that everybody pays -- decoding opcodes is a lot more expensive than checking a slot at the C level. I also worry that those 5 extra slots cost a lot of memory (and cache lines), given that a typical program has 1000s of type objects lying around. (If you don't believe me, instrument the type initialization function.) -- --Guido van Rossum (python.org/~guido)

Guido van Rossum wrote:
Yeah, I'd like to see the new generated code. If it's even a single opcode longer than the old code, that's a big cost that everybody pays
In the prototype implementation it is slightly longer in some cases, but if we're willing to introduce a few more bytecode instructions it could be made the same length. I'll see about adding some discussion on this to the PEP. Concerning examples, there's already one included with the patch file, but it's not very visible anywhere. Would you like the examples included in-line in the PEP (they will be about 2-3 screenfuls each or so) or would it be better to host them elsewhere and just put links in the PEP? -- Greg

On Mon, Sep 12, 2011 at 9:51 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I guess I might understand this paragraph if you pointed me to the code. :-(
I think I just mentioned one (turning 'if not' into a jump). Anyway, I'm glad to reject the PEP for the reason that I like the status quo just fine. (But relaxing __contains__ and adding __in__ as its reverse have my blessing.) Also, after reading the PEP from beginning to end, and downloading and skimming the patch (but failing to actually compile a patched version of Python 2.3), I think the offered API is too complicated to be of much use. Certainly the NumPy folks have repeatedly claimed that they are fine with the status quo. -- --Guido van Rossum (python.org/~guido)

On Wed, Sep 14, 2011 at 3:56 AM, Guido van Rossum <guido@python.org> wrote:
http://hg.python.org/cpython/file/default/Objects/rangeobject.c#l603 (scroll up a bit from where that link lands for the definition of the O(1) check) For the bit about breaking identity-keyed dictionaries, consider a hypothetical naive containment implementation like this: def __in__(self, container): return int(self) in container That code involves an implicit assumption that *all* containers are equivalence based, and that simply isn't true - it is sometimes useful to create a container that relies on object identity rather than value (e.g. to store additional metadata about arbitrary objects, even mutable ones). So a more correct implementation would have to look something like: def __in__(self, container): if isinstance(container, range): # We know range containment is equivalence based return int(self) in container return NotImplemented One additional complication that Alex Gaynor pointed out is that there are potentially *two* distinct operations that would need to be covered if "in" were to become a rich comparison operation - 'in' and 'not in'. Currently "not x in y" and "x not in y" generate identical bytecode, and the control flow optimisations that apply to the former can also be applied to the latter. Making 'in' a rich comparison operation would thus require choosing one of the following behaviours: - making 'not' itself avoid coercing to a boolean value (which you've already said you don't want to do) - retaining the equivalence between "x not in y" and "not x in y" and simply accepting that "x in y" is a rich comparison while "x not in y" is not - adding two additional special methods to cover the 'not in' case, defining a method precedence that covers the various combinations of methods on the operands for both 'in' and 'not in' and disentangling all the parts of the code generator that assume the equivalence of "x not in y" and "not x in y" None of those options sound particularly appealing. A rich 'not' would probably be the cleanest solution, but you've already given valid reasons for not wanting to do that.
OK, I'll add a rejection notice to PEP 335 with a link to this thread. Given the point above regarding the problems with "x not in y" vs "not x in y", do you want me to include something saying that rich containment checks are also rejected? Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Sep 13, 2011 at 3:26 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Gotcha. I missed that this was just about range objects. :-)
Hm, this reminds me of the thorny issue about 'is' implying '==' for certain container checks... But really, there are many other ways of making similar mistakes with other operators, and IMO a good __in__ method should always check if the RHS type is a known and supported type, and otherwise return NotImplemented like a good binary operator should.
- making 'not' itself avoid coercing to a boolean value (which you've already said you don't want to do)
Right.
I think this one is fine, actually.
Basically this would elevate 'not in' to a separate operator, just like '==' and '!=' are separate operators. Now, *if* we were to adopt PEP 335, this would be a reasonable approach. But since we're not, I think it's fine. If NumPy ever implements 'in' as returning an array, NumPy users might have to be warned that 'not in' doesn't work that way, and 'not (x in y)' doesn't work either. So they'll have to write something like '1 - (x in y)', assuming 'x in y' returns an array of bools. Big deal.
And I stick to them.
Let's mull that one over for a bit longer. It's not mentioned in PEP 335, is it? -- --Guido van Rossum (python.org/~guido)

On 14/09/11 05:56, Guido van Rossum wrote:
Did you see the comment I made concerning jump optimizations? (Briefly, I don't think it would do any harm to keep performing this optimization, since it wouldn't affect any of my intended use cases.)
Also, after reading the PEP from beginning to end ... I think the offered API is too complicated to be of much use.
There's a bit more machinery involved than there is with the other operators, but I believe that any given use cases will usually only *use* part of it. In particular, a system for building parse trees only needs to implement the 2-operand methods, which is no more complicated than overriding any of the existing operators. If it would help, I could provide a couple of fully-worked examples, so you can see how complicated (or not) it would be to use in practice. There's also the possibility of simplifying the API if it's considered that some of it will *never* be used, e.g. by eliminating the machinery for custom short-circuiting.
Certainly the NumPy folks have repeatedly claimed that they are fine with the status quo.
NumPy is not the only use case by a long shot. The main reason I want it is for expressing database queries. You only have to look at the contortions that things like Django have to go through to wonder whether there's a better way. Apologies if I appear to be flogging a resting equine here. If you really don't want to think about it any more, I'll shut up. But if it's going to be rejected, I'd prefer it to be rejected for genuine reasons rather than FUD. -- Greg

On Thu, Sep 15, 2011 at 2:22 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Some concrete use cases might help a lot. While such use cases can also be addressed via AST transformations, that's a rather large sledgehammer to break out just to customise some boolean logic. I'll hold off on marking the PEP as rejected until you've add a chance to add (and Guido has had a chance to read) those examples. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Sep 15, 2011 at 3:30 PM, Chris Rebert <pyideas@rebertia.com> wrote:
It's actually: 'x and y' vs 'x & y' 'x or y' vs 'x ^ y' 'not x' vs '~x' (assuming you aren't already using the bitwise versions for bitwise operations) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 9/15/2011 2:00 AM, Nick Coghlan wrote:
The problem with this is the precedence difference between '&' and 'and', etc. sqlalchemy uses this approach, and my code contains many extra sets of unnatural looking parens to deal with this issue. It's not a huge deal, but I see newbies (and me, and oldie) leaving off the parens. It's not an easy bug to track down. Eric.

On Fri, Sep 16, 2011 at 6:18 AM, Eric V. Smith <eric@trueblade.com> wrote:
Yeah, Greg does mention that in the PEP. Some concrete examples showing: - buggy code with bitwise operator overloading - correct code with parens used to correct bitwise precedence disparity - correct code with boolean logic overloading (as proposed by the PEP) would help make it crystal clear. Those would also help decide how important it is to preserve the short-circuiting semantics when overloading the operations - element-wise and DSL type use cases are always going to need the RHS, so there may be an opportunity there to simplify the proposed semantics to match those of other binary operations. And, of course, the final option is to go the C/C++ route: leave and/or/not as short-circuiting flow control statements and introduce && and || as low precedence binary logical operators. (I can't see Guido ever going for that idea, but it *does* sidestep the concerns about currently legal control flow transformations) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Chris Rebert wrote:
So, you want this in order to avoid (e.g.) `X & Y` and `not_(Z)`, in favor of `X and Y` and `not Z`?
Yes. The problem with & and | is that their precedence is all wrong in relation to the comparison operators. So instead of a == 17 and b == 42 you have to write (a == 17) & (b == 42) And not_(x) is just plain ugly. -- Greg

On Wed, Sep 14, 2011 at 9:22 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Actually, accepting or rejecting a PEP is always a matter of gut feelings. Usually there's no hard argument that can be made about the benefits for the language (a few use cases are easier to write, some bugs are avoided) vs. the downsides (some users will be confused, some bugs will occur, docs have to be updated, the compiler and runtime will be a little more complex, the generated code may be a little bit bulkier, etc.). It's hard to weigh these, so please don't call it FUD, and be glad it's not your responsibility to decide. That said, having just implemented a query engine (http://code.google.com/p/appengine-ndb-experiment/source/browse/ndb/query.py) I appreciate the pain for that particular use case and will hold off until I've seen your examples. -- --Guido van Rossum (python.org/~guido)

Guido van Rossum wrote:
It would mean that the compiler no longer has the option to translate "if not" into a reversal of the control flow;
This only seems to happen when the expression is used directly as the subject of an if-test. For the use cases I had in mind when I wrote PEP 335, this wouldn't happen -- you're building a parse tree in order to perform some other processing on it, not to immediately use it in an if-branch. So I think it would be fine for the compiler to reserve the right to apply the laws of boolean algebra to simplify an expression that appears in a boolean context. -- Greg

On 9/12/2011 4:20 PM, Lukas Lueg wrote: the values it encounters. Conceptually, these two are the same (where S is a sequence S0, S1, S2, ..., Sn): any(S) S0 or S1 or S2 or ... or Sn They are equivalent except that if Sx is the first true-ish value, the first will return True while the second returns Sx. Why shouldn't any() also return Sx? --Ned.

On Wed, Sep 14, 2011 at 12:39 PM, MRAB <python@mrabarnett.plus.com> wrote:
If none are true-ish, should it return the final (false-ish) value?
By analogy with 'or', yes. (and ditto for 'all' vs 'and')
What should any([]) return?
That's the more valid rationale - any()/all() need to work with an empty iterable, so the most obvious solution is to limit the range to True/False rather than having the result be data dependent. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Ned Batchelder wrote:
Perhaps 'need' is putting it too strongly, but if any()/all() were changed to return the first true-ish/last false-ish value, then the extra parameter would be convenient in specifying the value desired if it wasn't True/False. Sort of like {}.get('nothere', '5'). ~Ethan~

Mike Graham wrote:
I don't see that. next() returns the next item from an iterator regardless of its boolean state. The actual and proposed implementations of any() skip over false items. The proposal might be implemented something like this: def any(iterable): item = False for item in iterable: if item: return item # currently return True return item The practical difference should be clear if you compare calling: next(iter([0]*1000 + [1])) versus: any([0]*1000 + [1]) -- Steven

On 15/09/2011 00:15, Greg Ewing wrote:
Could it also have something like a 'key' argument, by default __bool__? It would return the first item for which key(item) returns True or something true-ish. There's also the option of a 'default' argument for when there are no true-ish items.

On 15/09/11 11:25, MRAB wrote:
Could it also have something like a 'key' argument, by default __bool__?
Or more generally, a function. Although that's not strictly necessary. If there were a first() function that simply returned the first item from an iterator, whether true or not, one could write first(itertools.ifilter(func, something)) -- Greg

On Wed, Sep 14, 2011 at 4:15 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Ruby calls it "detect": http://www.ruby-doc.org/core/classes/Enumerable.html#M001485 Cheers, Chris

On Thu, Sep 15, 2011 at 10:20 AM, Chris Rebert <pyideas@rebertia.com> wrote:
from itertools import dropwhile def _not(x): return not x def first(iterable): return next(dropwhile(_not, iterable)) Adjust implementation to taste/use case (e.g. make the predicate configurable or accept a default value rather than throwing StopIteration for empy sequences). There comes a point where it is better to just let people compose existing tools according to their specific needs rather than providing yet another special purpose tool. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 9/15/2011 8:40 AM, Sven Marnach wrote:
Out of context nonsense. In from itertools import dropwhile def _not(x): return not x def first(iterable): return next(dropwhile(_not, iterable)) replacing 'dropwhile(_not, iterable)' with 'chain(dropwhile(_not,iterable),[default]' will *always* return default if dropwhile is empty *and* guarantee that there is a first to be returned instead of raising StopIteration, which should otherwise be trapped, as StopIteration should only be passed out by iterators. Actually, I would probably not bother with chain and instead write def first(iterable,default): try: return next(dropwhile(_not, iterable)) except StopIteration: return default -- Terry Jan Reedy

On 9/14/2011 7:30 AM, Ned Batchelder wrote:
For example, max([]) raises a ValueError, so why doesn't any([])?
You should turn the question around, any([]) returns the identity element for any(), so why does't max([])? Because Python does not have an artificial universal minimum object. Guido has rejected the idea and it makes even less sense in Python where cross-type comparisons are generally discouraged. If max and min were restricted to totally ordered numbers, then fhi=float('inf') and flo=-fhi would work. But they are not. -- Terry Jan Reedy

Terry Reedy wrote:
Even if it did, it wouldn't necessarily make sense for max() to return it. max(s) means "the largest member of set s", and for it to return something that wasn't a member of s would be perverse. In every case I've encountered where there's a possibility of min() or max() being applied to an empty collection, the empty state had to be treated as a special case anyway. So it's just as easy to test for the empty case before calling min() or max(). -- Greg

MRAB wrote:
What should any([]) return?
I'm not sure I understand the question, because any([]) already has an answer, and changing it would be a backwards-incompatible change. So I don't quite understand the point of the question: the answer is "exactly what it already returns".
any([]) False
Likewise for all:
all([]) True
This is known as "vacuous truth": https://secure.wikimedia.org/wikipedia/en/wiki/Vacuous_truth As far as justifying the results returned by any() and all(), the decision to return False and True respectively seems to work well for me. I have rarely, if ever, needed the opposite behaviour: if not mylist or any(mylist): ... if mylist and all(mylist): .... so, at least for me, the behaviour of all() and any() seems like the right behaviour. However, I point out that if the argument is an iterator instead of a sequence, it's significantly harder: try: flag = bool(next(it)) except StopIteration: flag = False # instead of True else: flag = all(it) Perhaps all() and any() should take an optional argument to return if the iterable is empty? -- Steven

On Tue, Sep 13, 2011 at 8:59 PM, Ned Batchelder <ned@nedbatchelder.com> wrote:
.. Why shouldn't any() also return Sx?
What is the use case for returning the first "trueish" value? In most cases, any() is used in boolean context, say if any(S): do_something() In this case, returning Sx would result in Sx.__bool__() being called twice. It is also rare that any() is called on the raw sequence of objects that are of interest. More often it is used to express predicates such as any(x < 0 for x in S).

On 14/09/2011 17:17, Alexander Belopolsky wrote:
I had a use-case recently. I was looking for an entry in a pseudo-dict, where the key didn't have to be an exact match, so I called the .get method on the pseudo-dict until it returned a true-ish value (the value would never be false-ish). I just wrote a short function to do what I wanted.

MRAB schrieb am Mi, 14. Sep 2011, um 18:19:45 +0100:
FWIW, I just came to python-ideas to propose the same change to any() for the same use case (look-up in a pseudo dict). I needed this several times now, and I would consider the change to any() worthwhile. (For consistency, all() would also have to be changed, though I didn't have a use case yet.) -- Sven

On Thu, Sep 15, 2011 at 8:30 AM, Sven Marnach <sven@marnach.net> wrote:
As someone suggested earlier in this thread, this can be achieved with a very simple expression (not even a function!): next(filter(None, S)) will return the first item x from S with bool(x) evaluating to True. Further simplifying this is not worth slowing down all current uses of any().

On 15 September 2011 14:40, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
And if the special-case behaviour of filter(None, ...) upsets you, a generator expression is just as good: next(x for x in S if x) That's simple and readable enough (to my mind). In fact, it's more obvious to me than any(S) returning a non-boolean result would be... Paul.

Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
Yeah, but you've got to think up that expression. I agree that slowing down any() is not a good thing, but there's surely a place for this functionality -- probably more useful than any() itself. I've wished several times that any() just returned the first hit. Personally, I'd add a function some() that either returns a matching value or raises an exception. Bill

On Thu, Sep 15, 2011 at 10:13 AM, Bill Janssen <janssen@parc.com> wrote:
I recall we discussed this, but I don't recall the exact reasons why we (I?) decided that any() and all() should always return True/False. I can guess that it was because of consistency with the return values for any([]) and all([]), which have to be False and True, respectively. Considering the use cases, and Python's pervasive use of any-object-as-Boolean, I think it would have been fine if they had been defined as follows: def any(xs): x = False for x in xs: if x: break return x def all(xs): x = True for x in xs: if not x: break return x (Relying on two Pythonic properties of the for-loop: if the sequence xs is empty, x is not set by the for-loop at all, and keeps its previous value; if xs is not empty and the for-loop runs until xs is exhausted, x keeps the last value assigned to it in the loop.) I am not so sure that it is safe to change them now, however. Maybe these upgraded versions can be added to some other module, e.g. functools or itertools? -- --Guido van Rossum (python.org/~guido)

On Thu, Sep 15, 2011 at 1:45 PM, Guido van Rossum <guido@python.org> wrote: ..
Did you consider that in a typical "if any(S):" construct, x.__bool__() will be called twice on the found object? It is not unheard of to have expensive __bool__(). For example in a vector library a vector may be considered "false" if all its components are zero.

Amaury Forgeot d'Arc schrieb am Fr, 16. Sep 2011, um 08:16:58 +0200:
It's not the same -- the statement "if a or b:" will check the truth values of "a" and "b" at most once in CPython, while an updated version of "any()" would check the truth value of the found object twice -- since "any()" is an ordinary function, it can't easily be special-cased in the compiler. This seems like a good reason to make "any()" behave differently than "or" (apart from this behaviour being the current status quo anyway). -- Sven

On Mon, Sep 12, 2011 at 4:20 PM, Lukas Lueg <lukas.lueg@googlemail.com> wrote: ..
You may want to take a look at PEP 335: http://www.python.org/dev/peps/pep-0335/

On Mon, Sep 12, 2011 at 4:20 PM, Lukas Lueg <lukas.lueg@googlemail.com> wrote:
Your basic idea seems like a good one to me at an abstract level -- methods like __contains__ have no good design reason to typecheck. However, I don't think any of the concrete changes here serve to make Python a nicer or saner language. There is no reason why set should be changed either of the ways you discuss, and there is no excuse I can think of to implement something that works like mycontainer. Mike

On 9/12/2011 4:20 PM, Lukas Lueg wrote:
However the "or"-operator here grinds them both down to being just True or False.
Both 'and' and 'or' return one of their inputs unchanged and unground.
If both inputs are True or False, then or course you always get one of them back. But again, no 'grind'ing.
In Python, 'and' and 'or' are *not* (functional) boolean operations. They are flow-control keywords that abbreviate in expression form an if-else statement pattern. They are compiled to similar bytecode. def f1(a,b): return a and b def f2(a,b): if not a: return a else: return b from dis import dis dis(f1) dis(f2) ### 0 LOAD_FAST 0 (a) 3 JUMP_IF_FALSE_OR_POP 9 6 LOAD_FAST 1 (b) 9 RETURN_VALUE 0 LOAD_FAST 0 (a) 3 POP_JUMP_IF_TRUE 10 6 LOAD_FAST 0 (a) 9 RETURN_VALUE 10 LOAD_FAST 1 (b) 13 RETURN_VALUE The difference is that the compiler does not notice that the 'a' to be returned is the same 'a' that it just tested and which is still sitting on the stack, so it reloads it or, if is were an expression, would recalculate it. There is no internal function that *could* be overloaded. In this sense, they are similar to 'lambda expressions, which abbreviate a simple def statement pattern and which compile to essentially the same bytecode as the def statement. The xor bitwise int operator '^' can be overloaded: __xor__ and __rxor__. Logical xor can be implemented as bool(a)^bool(b). Before there was a bool class, 'not not a' was the direct way to coerce a to a boolean value. Anything that could be done with a hidden .__not__ method could also be done, and I would say, better done, with an overt .invert method. For instance, a mutable binary array class should likely have .invert to flip each member in-place. As it is now, anyone reading 'not a' can depend on the result being a bool. And I think that is good. -- Terry Jan Reedy

Eric Snow wrote:
So you want "give me the item 'foo' in mycontainer" rather than "is item 'foo' in mycontainer"? Isn't that what __getitem__ does already?
The canonical use case for this kind of thing is numpy arrays. To be consistent with the numpy philosophy, 'a in b' where b is an array should give you an array of values indicating whether a is in each element of b. -- Greg

Lukas Lueg wrote:
This scheme would be of limited usefulness. It still assumes that the operands *can* be treated as boolean values, which is not always the case -- numpy arrays can't, for example. Also, the short-circuiting behaviour is still defined by their boolean values. I suggested a more complete way of addressing this in PEP 335. -- Greg

Philosophically, the idea that boolean operations like x in y should be able to return any value for True makes sense. The details leave much to be desired. It wouldn't make sense for it to return x since 0 in [0,1] and '' in 'string' should both return a true value and 0 and '' are false. On Mon, Sep 12, 2011 at 1:20 PM, Lukas Lueg <lukas.lueg@googlemail.com>wrote:
You've lost me where you suggest changing the behavior of == True. And where you propose that x > y should return x - y. You also don't mention what x < y should return. The real problem is the lack of a compelling scenario why you'd want to make this change. Aside from that, checking to see if one set is contained in another does not require computation of the full difference. Why should very_big_set > (1,2) go to the overhead of constructing a still very big set that's going to be immediately discarded?
I don't know the scenario where 3 > 2 returning 1 is useful. Would 2 < 3 also return 1? The Icon programming language (and others) has 2 < 3 return 3. This makes expressions like 2 < 3 < 4 work. And in fact python and/or operators work in exactly the same way returning the value that determines the answer (2 and 3) equals 3. Icon can do this because it has a special result 'fail' which blocks further evaluation, roughly equivalent to 1 < 0 throwing an exception. It wouldn't work in Python for lots of reasons.
So what? This isn't a general principle or anything. All you've done is say that if you define set<set to return non-bool values and __band__ and __bor__ preserve the invariants bool(x.__band__(y)) is bool(x and y) bool(x.__bor__(y)) is bool(x or y) This doesn't prove that "(setA > setB) or (setA < setB)" is always true or anything (because it isn't of course).
We can't do this with boolean operations being part of the language only.
Nor do we need to. Figuring out the value of "(setA > setB) or (setA < setB)" is much more less convoluted than that. --- Bruce Follow me: http://www.twitter.com/Vroo http://www.vroospeak.com

Can we not allow things like `a < b` to return non-boolean values, without altering the behaviour of existing Python types? There _are_ use cases (numpy, ORMs, DSLs, and numpy). These don't really fit the bill. Devin On Mon, Sep 12, 2011 at 4:20 PM, Lukas Lueg <lukas.lueg@googlemail.com> wrote:

On Mon, 2011-09-12 at 21:40 -0400, Devin Jeanpierre wrote:
Can we not allow things like `a < b` to return non-boolean values, without altering the behaviour of existing Python types?
Would that return 'a' or 'b', or something else?

Would that return 'a' or 'b', or something else?
Usually it'd return True or False. Same with things like `foo in bar`. I just mean that we can extend these to return other objects without insisting that builtin types (floats, sets, etc.) do weird things with them. Devin On Mon, Sep 12, 2011 at 10:13 PM, Ron Adam <ron3200@gmail.com> wrote:

On 13 September 2011 16:05, Lukas Lueg <lukas.lueg@googlemail.com> wrote:
Maybe I'm being dense, but why would I read it like that? The operation a < b means "is a less than b?" That's not a matter of opinion, as far as I can see, that's basically the definition of "<". You could certainly overload < to mean something else for particular types, but that's not something to do lightly (look at C++'s use of << for IO, for a relatively benign example of both the benefits and problems with such an approach...) Paul.

On Tue, Sep 13, 2011 at 8:05 AM, Lukas Lueg <lukas.lueg@googlemail.com>wrote:
Then what would a <= b or a == b or a != b return? This idea of returning 'how far apart are these objects' doesn't generalize for integers much less more complex types. You can't design a new feature by giving one example ... and no use cases. --- Bruce Follow me: http://www.twitter.com/Vroo http://www.vroospeak.com

On Tue, Sep 13, 2011 at 11:40 AM, Devin Jeanpierre <jeanpierreda@gmail.com> wrote:
Can we not allow things like `a < b` to return non-boolean values, without altering the behaviour of existing Python types?
We already do. As far as I am aware, the only holdouts are: x in a (coerces to bool) not a (coerces to bool) a and b (doesn't coerce as such, but roughly equivalent to "a if not a else b") a or b (doesn't coerce as such, but roughly equivalent to "a if a else b") The first case can already be overridden (via __contains__) and I don't believe there's any specific reason for retaining the coercion to bool (aside from nobody writing and championing a patch that eliminates the restriction). PEP 207 (which added rich comparisons in the first place) is silent on the matter - it only talks about the comparison and ordering operations. The latter 3 cases are discussed in depth in PEP 335 (Overloadable Boolean Operators). Again, I don't believe there's anything fundamentally wrong with the idea, but it requires both an up to date reference implementation and a champion willing to gather use cases, assess the performance impact and generally argue in its favour. We don't have the numpy folks knocking down our door asking for distributable versions of these operations, after all. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Sep 12, 2011 at 8:39 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
But does the C code wrapping it preserve the non-bool value? (IIRC originally many of these were constrained by the C implementation more than by any philosophical desire to keep bool pure.)
Oh I see, you are saying that there is no need *in principle*. Agreed for __contains__. (BTW there is also a need for the reverse operator. Maybe it could be called __in__?)
It wasn't a priority at the time.
I'm skeptical. It would mean that the compiler no longer has the option to translate "if not" into a reversal of the control flow; it must generate code to execute the "not" operator. That seems like a pessimization. Here's some disassembler output to prove it:
It looks like the same argument cannot be made for 'and' / 'or', so perhaps those are fine -- though I also expect that there aren't many use cases. (For all these, I suspect the most common use case is actually symbolic algebra packages which want to overload operators so they can return a parse tree instead of an outcome. This works for regular operators and for comparisons, but not for in, not, and, or. But I'm not sure I want to pay the price for this flexibility everywhere where those operators are used in their traditional meanings. Admittedly I haven't read PEP 335 thoroughly. -- --Guido van Rossum (python.org/~guido)

On Tue, Sep 13, 2011 at 2:09 PM, Guido van Rossum <guido@python.org> wrote:
The two issues seem somewhat orthogonal to me, but yes, the general idea would be to make 'in' behave more like a rich comparison operator rather than an explicitly boolean operation as it does now. It occurs to me that adding __in__ could also address a slight performance oddity with 3.2 range objects: the __contains__ check in 3.2 has an O(1) fast path for containment checks on actual Python integers, but falls back to the O(n) sequential search for objects that only implement __index__(). If __in__() was available, such objects could conceivably map containment tests to checks against the corresponding real Python integer (although doing that carelessly would do weird things to other containers, such as identity-keyed dictionaries. That would be a quality of implementation issue on __in__ methods, though).
I think that's a pretty good explanation for why PEP 335 hasn't been pushed too hard - there's a lot of value in having not/and/or behave more like control flow structures rather than ordinary operations. PEP 335 does show that it's theoretically possible to make them behave more like operators, but that doesn't make it a good idea. To be honest, I don't think anyone would cry too much if you decided to explicitly reject it on the basis of continuing to allow control flow optimisations for code involving not/and/or. While CPython doesn't do it, I believe there *are* control flow transformations that the current semantics permit that PEP 335 would disallow, such as automatically applying De Morgan's Law (I don't actually have a use case for doing that, I'm just mentioning it as a consequence of the semantics change proposed by the PEP). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

The question is if we want to treat bools more special than any other type. Right now bools are the only place where this is done. Do we really want to specialize the language in order to favour control flow in the implementation? Forever? That's a bold statement.
Performance is always an argument. Python as a language, and most especially CPython itself, has a history of favouring design over application; the same should be true for bools. Concerning De Morgan's Law, I think in Python this should be treated as a convention in the implementation, not a rule of the language. After all, we also have [1,2,3] + [4,5,6] = [1,2,3,4,5,6] not by rule but by simple reasoning. Also: If an object does not override a certain boolean operation, the compiler can still take advantage of optimisation. Progress on the AST may pave the road for that.

On Tue, Sep 13, 2011 at 7:01 AM, Lukas Lueg <lukas.lueg@googlemail.com> wrote:
I have no problem with this statement and don't see it as particularly bold or controversial. Control flow just *is* special. FWIW It is not bool values that are being treated special; it is certain bool operators (not, and, or). I see nothing wrong with that; you can think of them as an extension of the repertoire of 'if', 'while' etc.
De Morgan's law (and similar transformation) seem to me out of scope. The would violate gut feelings about the "natural" execution order of Python expressions.
No, because the compiler does not have the runtime types of the object available. If you don't know how Python really works you shouldn't be making language change proposals. -- --Guido van Rossum (python.org/~guido)

Guido van Rossum wrote:
Seems to me they're something of a mixture of computation and control flow. The computation consists of calling the __nonzero__ methods of the operands to find out whether they should be considered true or false. PEP 335 doesn't change that, it just shifts the emphasis slightly. It provides hooks allowing you to do more computation before deciding on the control flow.
I don't see how. If e.g. you transform if not (a and b): into if not a or not b: then the evaluation order doesn't change -- you still evaluate a first, and then evaluate b if needed. It's just an extension of the existing rewriting of 'not' deeper into the expression. -- Greg

On Tue, Sep 13, 2011 at 5:46 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Yeah, and the extra computation (however slight) bothers me.
Fair enough, especially since the bytecode compiler already gladly interchanges JUMP_IF_FALSE and JUMP_IF_TRUE (swapping the targets). -- --Guido van Rossum (python.org/~guido)

On Fri, Sep 16, 2011 at 10:39 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
abstract.c says "Hi!". The memory layout of type objects means the dance to handle binary operators correctly is generally a little more complicated than that. The short circuiting support in PEP 335 will only make it worse, since there would be two slots to check on each operand. In this particular case though, the additional problem is that the interpreter currently treats 'and' and 'or' as flow control structures, not operators: # python 3.2
That means making it possible to overload them is actually a fairly deep change to the expression semantics as far as the compiler is concerned. The control flow would need to be something like: LOAD_NAME x JUMP_IF_LOGICAL_AND <target> (<-- handle shortcircuiting cases) LOAD_NAME y LOGICAL_AND (<-- handle binary version, potentially doing stack twiddling to ditch LH operand if there are no slots to call) <target> RETURN_VALUE That's why my suggestion to consider '&&' and '||' as alternate, lower precedence, spellings of '&' and '|' wasn't entirely in jest. Their overloading semantics could be identical to the existing bitwise operators, but their precedence would be just above that of the control flow operations. Having operators that are equivalent aside from subtle details of precedence would be odd, but it strikes me as being less problematic overall than messing with the control flow semantics of 'and' and 'or'. Reusing the '&' and '|' typeslots would also mean that existing libraries like SQLAlchemy that already use '&' and '|' for boolean logic would just work with the new operators. C/C++ programmers would get rather confused when they found out that '&&' and '||' on numeric types still resulted in bitwise operations, though. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Sep 16, 2011 at 11:53 AM, Guido van Rossum <guido@python.org> wrote:
But didn't Greg have a working patch? (For Python 2.3, but still...)
It isn't that I think the code changes are going to be huge, just that even in the basic case they're not as trivial as "reduces to noticing that a particular type slot is NULL at the C level" makes them sound. The shift from 'control flow operation' to 'binary operator' is fairly substantial, even if the changes are isolated to a couple of new opcodes and the code generation for a couple of AST nodes. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Sep 15, 2011 at 8:14 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Yeah, I'd like to see the new generated code. If it's even a single opcode longer than the old code, that's a big cost that everybody pays -- decoding opcodes is a lot more expensive than checking a slot at the C level. I also worry that those 5 extra slots cost a lot of memory (and cache lines), given that a typical program has 1000s of type objects lying around. (If you don't believe me, instrument the type initialization function.) -- --Guido van Rossum (python.org/~guido)

Guido van Rossum wrote:
Yeah, I'd like to see the new generated code. If it's even a single opcode longer than the old code, that's a big cost that everybody pays
In the prototype implementation it is slightly longer in some cases, but if we're willing to introduce a few more bytecode instructions it could be made the same length. I'll see about adding some discussion on this to the PEP. Concerning examples, there's already one included with the patch file, but it's not very visible anywhere. Would you like the examples included in-line in the PEP (they will be about 2-3 screenfuls each or so) or would it be better to host them elsewhere and just put links in the PEP? -- Greg

On Mon, Sep 12, 2011 at 9:51 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I guess I might understand this paragraph if you pointed me to the code. :-(
I think I just mentioned one (turning 'if not' into a jump). Anyway, I'm glad to reject the PEP for the reason that I like the status quo just fine. (But relaxing __contains__ and adding __in__ as its reverse have my blessing.) Also, after reading the PEP from beginning to end, and downloading and skimming the patch (but failing to actually compile a patched version of Python 2.3), I think the offered API is too complicated to be of much use. Certainly the NumPy folks have repeatedly claimed that they are fine with the status quo. -- --Guido van Rossum (python.org/~guido)

On Wed, Sep 14, 2011 at 3:56 AM, Guido van Rossum <guido@python.org> wrote:
http://hg.python.org/cpython/file/default/Objects/rangeobject.c#l603 (scroll up a bit from where that link lands for the definition of the O(1) check) For the bit about breaking identity-keyed dictionaries, consider a hypothetical naive containment implementation like this: def __in__(self, container): return int(self) in container That code involves an implicit assumption that *all* containers are equivalence based, and that simply isn't true - it is sometimes useful to create a container that relies on object identity rather than value (e.g. to store additional metadata about arbitrary objects, even mutable ones). So a more correct implementation would have to look something like: def __in__(self, container): if isinstance(container, range): # We know range containment is equivalence based return int(self) in container return NotImplemented One additional complication that Alex Gaynor pointed out is that there are potentially *two* distinct operations that would need to be covered if "in" were to become a rich comparison operation - 'in' and 'not in'. Currently "not x in y" and "x not in y" generate identical bytecode, and the control flow optimisations that apply to the former can also be applied to the latter. Making 'in' a rich comparison operation would thus require choosing one of the following behaviours: - making 'not' itself avoid coercing to a boolean value (which you've already said you don't want to do) - retaining the equivalence between "x not in y" and "not x in y" and simply accepting that "x in y" is a rich comparison while "x not in y" is not - adding two additional special methods to cover the 'not in' case, defining a method precedence that covers the various combinations of methods on the operands for both 'in' and 'not in' and disentangling all the parts of the code generator that assume the equivalence of "x not in y" and "not x in y" None of those options sound particularly appealing. A rich 'not' would probably be the cleanest solution, but you've already given valid reasons for not wanting to do that.
OK, I'll add a rejection notice to PEP 335 with a link to this thread. Given the point above regarding the problems with "x not in y" vs "not x in y", do you want me to include something saying that rich containment checks are also rejected? Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Sep 13, 2011 at 3:26 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Gotcha. I missed that this was just about range objects. :-)
Hm, this reminds me of the thorny issue about 'is' implying '==' for certain container checks... But really, there are many other ways of making similar mistakes with other operators, and IMO a good __in__ method should always check if the RHS type is a known and supported type, and otherwise return NotImplemented like a good binary operator should.
- making 'not' itself avoid coercing to a boolean value (which you've already said you don't want to do)
Right.
I think this one is fine, actually.
Basically this would elevate 'not in' to a separate operator, just like '==' and '!=' are separate operators. Now, *if* we were to adopt PEP 335, this would be a reasonable approach. But since we're not, I think it's fine. If NumPy ever implements 'in' as returning an array, NumPy users might have to be warned that 'not in' doesn't work that way, and 'not (x in y)' doesn't work either. So they'll have to write something like '1 - (x in y)', assuming 'x in y' returns an array of bools. Big deal.
And I stick to them.
Let's mull that one over for a bit longer. It's not mentioned in PEP 335, is it? -- --Guido van Rossum (python.org/~guido)

On 14/09/11 05:56, Guido van Rossum wrote:
Did you see the comment I made concerning jump optimizations? (Briefly, I don't think it would do any harm to keep performing this optimization, since it wouldn't affect any of my intended use cases.)
Also, after reading the PEP from beginning to end ... I think the offered API is too complicated to be of much use.
There's a bit more machinery involved than there is with the other operators, but I believe that any given use cases will usually only *use* part of it. In particular, a system for building parse trees only needs to implement the 2-operand methods, which is no more complicated than overriding any of the existing operators. If it would help, I could provide a couple of fully-worked examples, so you can see how complicated (or not) it would be to use in practice. There's also the possibility of simplifying the API if it's considered that some of it will *never* be used, e.g. by eliminating the machinery for custom short-circuiting.
Certainly the NumPy folks have repeatedly claimed that they are fine with the status quo.
NumPy is not the only use case by a long shot. The main reason I want it is for expressing database queries. You only have to look at the contortions that things like Django have to go through to wonder whether there's a better way. Apologies if I appear to be flogging a resting equine here. If you really don't want to think about it any more, I'll shut up. But if it's going to be rejected, I'd prefer it to be rejected for genuine reasons rather than FUD. -- Greg

On Thu, Sep 15, 2011 at 2:22 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Some concrete use cases might help a lot. While such use cases can also be addressed via AST transformations, that's a rather large sledgehammer to break out just to customise some boolean logic. I'll hold off on marking the PEP as rejected until you've add a chance to add (and Guido has had a chance to read) those examples. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Sep 15, 2011 at 3:30 PM, Chris Rebert <pyideas@rebertia.com> wrote:
It's actually: 'x and y' vs 'x & y' 'x or y' vs 'x ^ y' 'not x' vs '~x' (assuming you aren't already using the bitwise versions for bitwise operations) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 9/15/2011 2:00 AM, Nick Coghlan wrote:
The problem with this is the precedence difference between '&' and 'and', etc. sqlalchemy uses this approach, and my code contains many extra sets of unnatural looking parens to deal with this issue. It's not a huge deal, but I see newbies (and me, and oldie) leaving off the parens. It's not an easy bug to track down. Eric.

On Fri, Sep 16, 2011 at 6:18 AM, Eric V. Smith <eric@trueblade.com> wrote:
Yeah, Greg does mention that in the PEP. Some concrete examples showing: - buggy code with bitwise operator overloading - correct code with parens used to correct bitwise precedence disparity - correct code with boolean logic overloading (as proposed by the PEP) would help make it crystal clear. Those would also help decide how important it is to preserve the short-circuiting semantics when overloading the operations - element-wise and DSL type use cases are always going to need the RHS, so there may be an opportunity there to simplify the proposed semantics to match those of other binary operations. And, of course, the final option is to go the C/C++ route: leave and/or/not as short-circuiting flow control statements and introduce && and || as low precedence binary logical operators. (I can't see Guido ever going for that idea, but it *does* sidestep the concerns about currently legal control flow transformations) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Chris Rebert wrote:
So, you want this in order to avoid (e.g.) `X & Y` and `not_(Z)`, in favor of `X and Y` and `not Z`?
Yes. The problem with & and | is that their precedence is all wrong in relation to the comparison operators. So instead of a == 17 and b == 42 you have to write (a == 17) & (b == 42) And not_(x) is just plain ugly. -- Greg

On Wed, Sep 14, 2011 at 9:22 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Actually, accepting or rejecting a PEP is always a matter of gut feelings. Usually there's no hard argument that can be made about the benefits for the language (a few use cases are easier to write, some bugs are avoided) vs. the downsides (some users will be confused, some bugs will occur, docs have to be updated, the compiler and runtime will be a little more complex, the generated code may be a little bit bulkier, etc.). It's hard to weigh these, so please don't call it FUD, and be glad it's not your responsibility to decide. That said, having just implemented a query engine (http://code.google.com/p/appengine-ndb-experiment/source/browse/ndb/query.py) I appreciate the pain for that particular use case and will hold off until I've seen your examples. -- --Guido van Rossum (python.org/~guido)

Guido van Rossum wrote:
It would mean that the compiler no longer has the option to translate "if not" into a reversal of the control flow;
This only seems to happen when the expression is used directly as the subject of an if-test. For the use cases I had in mind when I wrote PEP 335, this wouldn't happen -- you're building a parse tree in order to perform some other processing on it, not to immediately use it in an if-branch. So I think it would be fine for the compiler to reserve the right to apply the laws of boolean algebra to simplify an expression that appears in a boolean context. -- Greg

On 9/12/2011 4:20 PM, Lukas Lueg wrote: the values it encounters. Conceptually, these two are the same (where S is a sequence S0, S1, S2, ..., Sn): any(S) S0 or S1 or S2 or ... or Sn They are equivalent except that if Sx is the first true-ish value, the first will return True while the second returns Sx. Why shouldn't any() also return Sx? --Ned.

On Wed, Sep 14, 2011 at 12:39 PM, MRAB <python@mrabarnett.plus.com> wrote:
If none are true-ish, should it return the final (false-ish) value?
By analogy with 'or', yes. (and ditto for 'all' vs 'and')
What should any([]) return?
That's the more valid rationale - any()/all() need to work with an empty iterable, so the most obvious solution is to limit the range to True/False rather than having the result be data dependent. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Ned Batchelder wrote:
Perhaps 'need' is putting it too strongly, but if any()/all() were changed to return the first true-ish/last false-ish value, then the extra parameter would be convenient in specifying the value desired if it wasn't True/False. Sort of like {}.get('nothere', '5'). ~Ethan~

Mike Graham wrote:
I don't see that. next() returns the next item from an iterator regardless of its boolean state. The actual and proposed implementations of any() skip over false items. The proposal might be implemented something like this: def any(iterable): item = False for item in iterable: if item: return item # currently return True return item The practical difference should be clear if you compare calling: next(iter([0]*1000 + [1])) versus: any([0]*1000 + [1]) -- Steven

On 15/09/2011 00:15, Greg Ewing wrote:
Could it also have something like a 'key' argument, by default __bool__? It would return the first item for which key(item) returns True or something true-ish. There's also the option of a 'default' argument for when there are no true-ish items.

On 15/09/11 11:25, MRAB wrote:
Could it also have something like a 'key' argument, by default __bool__?
Or more generally, a function. Although that's not strictly necessary. If there were a first() function that simply returned the first item from an iterator, whether true or not, one could write first(itertools.ifilter(func, something)) -- Greg

On Wed, Sep 14, 2011 at 4:15 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Ruby calls it "detect": http://www.ruby-doc.org/core/classes/Enumerable.html#M001485 Cheers, Chris

On Thu, Sep 15, 2011 at 10:20 AM, Chris Rebert <pyideas@rebertia.com> wrote:
from itertools import dropwhile def _not(x): return not x def first(iterable): return next(dropwhile(_not, iterable)) Adjust implementation to taste/use case (e.g. make the predicate configurable or accept a default value rather than throwing StopIteration for empy sequences). There comes a point where it is better to just let people compose existing tools according to their specific needs rather than providing yet another special purpose tool. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Terry Reedy schrieb am Mi, 14. Sep 2011, um 22:52:47 -0400:
That is what itertool is designed for. I believe chain(dropwhile(...), [default]) will add a default.
... unless the default is falsy. -- Sven

On 9/15/2011 8:40 AM, Sven Marnach wrote:
Out of context nonsense. In from itertools import dropwhile def _not(x): return not x def first(iterable): return next(dropwhile(_not, iterable)) replacing 'dropwhile(_not, iterable)' with 'chain(dropwhile(_not,iterable),[default]' will *always* return default if dropwhile is empty *and* guarantee that there is a first to be returned instead of raising StopIteration, which should otherwise be trapped, as StopIteration should only be passed out by iterators. Actually, I would probably not bother with chain and instead write def first(iterable,default): try: return next(dropwhile(_not, iterable)) except StopIteration: return default -- Terry Jan Reedy

On 9/14/2011 7:30 AM, Ned Batchelder wrote:
For example, max([]) raises a ValueError, so why doesn't any([])?
You should turn the question around, any([]) returns the identity element for any(), so why does't max([])? Because Python does not have an artificial universal minimum object. Guido has rejected the idea and it makes even less sense in Python where cross-type comparisons are generally discouraged. If max and min were restricted to totally ordered numbers, then fhi=float('inf') and flo=-fhi would work. But they are not. -- Terry Jan Reedy

Terry Reedy wrote:
Even if it did, it wouldn't necessarily make sense for max() to return it. max(s) means "the largest member of set s", and for it to return something that wasn't a member of s would be perverse. In every case I've encountered where there's a possibility of min() or max() being applied to an empty collection, the empty state had to be treated as a special case anyway. So it's just as easy to test for the empty case before calling min() or max(). -- Greg

MRAB wrote:
What should any([]) return?
I'm not sure I understand the question, because any([]) already has an answer, and changing it would be a backwards-incompatible change. So I don't quite understand the point of the question: the answer is "exactly what it already returns".
any([]) False
Likewise for all:
all([]) True
This is known as "vacuous truth": https://secure.wikimedia.org/wikipedia/en/wiki/Vacuous_truth As far as justifying the results returned by any() and all(), the decision to return False and True respectively seems to work well for me. I have rarely, if ever, needed the opposite behaviour: if not mylist or any(mylist): ... if mylist and all(mylist): .... so, at least for me, the behaviour of all() and any() seems like the right behaviour. However, I point out that if the argument is an iterator instead of a sequence, it's significantly harder: try: flag = bool(next(it)) except StopIteration: flag = False # instead of True else: flag = all(it) Perhaps all() and any() should take an optional argument to return if the iterable is empty? -- Steven

On Tue, Sep 13, 2011 at 8:59 PM, Ned Batchelder <ned@nedbatchelder.com> wrote:
.. Why shouldn't any() also return Sx?
What is the use case for returning the first "trueish" value? In most cases, any() is used in boolean context, say if any(S): do_something() In this case, returning Sx would result in Sx.__bool__() being called twice. It is also rare that any() is called on the raw sequence of objects that are of interest. More often it is used to express predicates such as any(x < 0 for x in S).

On 14/09/2011 17:17, Alexander Belopolsky wrote:
I had a use-case recently. I was looking for an entry in a pseudo-dict, where the key didn't have to be an exact match, so I called the .get method on the pseudo-dict until it returned a true-ish value (the value would never be false-ish). I just wrote a short function to do what I wanted.

MRAB schrieb am Mi, 14. Sep 2011, um 18:19:45 +0100:
FWIW, I just came to python-ideas to propose the same change to any() for the same use case (look-up in a pseudo dict). I needed this several times now, and I would consider the change to any() worthwhile. (For consistency, all() would also have to be changed, though I didn't have a use case yet.) -- Sven

On Thu, Sep 15, 2011 at 8:30 AM, Sven Marnach <sven@marnach.net> wrote:
As someone suggested earlier in this thread, this can be achieved with a very simple expression (not even a function!): next(filter(None, S)) will return the first item x from S with bool(x) evaluating to True. Further simplifying this is not worth slowing down all current uses of any().

On 15 September 2011 14:40, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
And if the special-case behaviour of filter(None, ...) upsets you, a generator expression is just as good: next(x for x in S if x) That's simple and readable enough (to my mind). In fact, it's more obvious to me than any(S) returning a non-boolean result would be... Paul.

Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
Yeah, but you've got to think up that expression. I agree that slowing down any() is not a good thing, but there's surely a place for this functionality -- probably more useful than any() itself. I've wished several times that any() just returned the first hit. Personally, I'd add a function some() that either returns a matching value or raises an exception. Bill

On Thu, Sep 15, 2011 at 10:13 AM, Bill Janssen <janssen@parc.com> wrote:
I recall we discussed this, but I don't recall the exact reasons why we (I?) decided that any() and all() should always return True/False. I can guess that it was because of consistency with the return values for any([]) and all([]), which have to be False and True, respectively. Considering the use cases, and Python's pervasive use of any-object-as-Boolean, I think it would have been fine if they had been defined as follows: def any(xs): x = False for x in xs: if x: break return x def all(xs): x = True for x in xs: if not x: break return x (Relying on two Pythonic properties of the for-loop: if the sequence xs is empty, x is not set by the for-loop at all, and keeps its previous value; if xs is not empty and the for-loop runs until xs is exhausted, x keeps the last value assigned to it in the loop.) I am not so sure that it is safe to change them now, however. Maybe these upgraded versions can be added to some other module, e.g. functools or itertools? -- --Guido van Rossum (python.org/~guido)

On Thu, Sep 15, 2011 at 1:45 PM, Guido van Rossum <guido@python.org> wrote: ..
Did you consider that in a typical "if any(S):" construct, x.__bool__() will be called twice on the found object? It is not unheard of to have expensive __bool__(). For example in a vector library a vector may be considered "false" if all its components are zero.

Amaury Forgeot d'Arc schrieb am Fr, 16. Sep 2011, um 08:16:58 +0200:
It's not the same -- the statement "if a or b:" will check the truth values of "a" and "b" at most once in CPython, while an updated version of "any()" would check the truth value of the found object twice -- since "any()" is an ordinary function, it can't easily be special-cased in the compiler. This seems like a good reason to make "any()" behave differently than "or" (apart from this behaviour being the current status quo anyway). -- Sven
participants (25)
-
Alexander Belopolsky
-
Amaury Forgeot d'Arc
-
Bill Janssen
-
Bruce Leban
-
Carl Matthew Johnson
-
Chris Rebert
-
Christopher King
-
Devin Jeanpierre
-
Eric Snow
-
Eric V. Smith
-
Ethan Furman
-
Georg Brandl
-
Greg Ewing
-
Guido van Rossum
-
Jacob Holm
-
Lukas Lueg
-
Mike Graham
-
MRAB
-
Ned Batchelder
-
Nick Coghlan
-
Paul Moore
-
Ron Adam
-
Steven D'Aprano
-
Sven Marnach
-
Terry Reedy