Re: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators
Message: 4 Date: Fri, 10 Sep 2004 12:58:23 +1200 From: Greg Ewing <greg@cosc.canterbury.ac.nz> Subject: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators To: python-dev@python.org Message-ID: <200409100058.i8A0wNIV002743@cosc353.cosc.canterbury.ac.nz>
Python does not currently provide any '__xxx__' special methods corresponding to the 'and', 'or' and 'not' boolean operators.
I like the PEP with 'and' and 'or', but isn't the 'not' special method essentially the inverse of __nonzero__? -Michel
At 12:46 AM 9/9/04 -0700, Michel Pelletier wrote:
Message: 4 Date: Fri, 10 Sep 2004 12:58:23 +1200 From: Greg Ewing <greg@cosc.canterbury.ac.nz> Subject: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators To: python-dev@python.org Message-ID: <200409100058.i8A0wNIV002743@cosc353.cosc.canterbury.ac.nz>
Python does not currently provide any '__xxx__' special methods corresponding to the 'and', 'or' and 'not' boolean operators.
I like the PEP with 'and' and 'or', but isn't the 'not' special method essentially the inverse of __nonzero__?
There isn't such a method currently. Also, note that the expression 'not x' is currently guaranteed to return a boolean value. The purpose of the PEP is to allow 'not x' to potentially return an arbitrary object, as for use in algebraic and query systems that want to use Python code as their syntax. Such systems currently use e.g. '~x' instead of 'not x' because the former allows return of arbitrary objects. IMO, the algebraic/query use cases would be better served by some sort of "code literal" or "AST literal" syntax, rather than adding more special methods. The reason is that all too often you want to include "normal" Python values in such an expression, but still manipulate them symbolically, or have some other sort of special treatment. A literal syntax for Python expressions is more useful for this, which is why I've moved to using strings and the parser module to accomplish such processing. At that level, boolean operator methods are moot. (Code literals would be useful primarily in the ability to have them parsed and syntax checked at import time, rather than waiting until runtime. This consideration also applies to PEP 335, but PEP 335 may consume all of its compilation performance gains by losing runtime performance at all boolean operation sites.) But anyway, I digress. Since PEP 335 doesn't significantly help (IMO) with algebraic and query systems, that leaves the numeric use cases, which I don't have enough experience to comment on.
Phillip J. Eby wrote:
I like the PEP with 'and' and 'or', but isn't the 'not' special method essentially the inverse of __nonzero__?
There isn't such a method currently.
Did you mean to say that there is currently no method named __nonzero__? This is not true:
class X: ... def __nonzero__(self): ... print "Called" ... return 13 ... not X() Called False
Regards, Martin
At 10:59 PM 9/10/04 +0200, Martin v. Löwis wrote:
Phillip J. Eby wrote:
I like the PEP with 'and' and 'or', but isn't the 'not' special method essentially the inverse of __nonzero__?
There isn't such a method currently.
Did you mean to say that there is currently no method named __nonzero__?
No; that there was no method named '__not__'.
IMO, the algebraic/query use cases would be better served by some sort of "code literal" or "AST literal" syntax
You may be right about the symbolic algebra case, if the intent is to be able to write code that manipulates expressions, in which case writing the expressions to be manipulated as literals of some kind may make sense. But I don't agree in the SQL case, where my intent is for the user to simply write Python code that performs database queries, not write Python code that constructs trees of SQL expressions that perform database queries. The fact that expression manipulation is going on should be an implementation detail that the user doesn't need to be aware of. Having to write the query expressions using some special syntax would interfere with that. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
At 02:58 PM 9/14/04 +1200, Greg Ewing wrote:
IMO, the algebraic/query use cases would be better served by some sort of "code literal" or "AST literal" syntax
You may be right about the symbolic algebra case, if the intent is to be able to write code that manipulates expressions, in which case writing the expressions to be manipulated as literals of some kind may make sense.
But I don't agree in the SQL case, where my intent is for the user to simply write Python code that performs database queries, not write Python code that constructs trees of SQL expressions that perform database queries.
So, something like this: query("x and y or z") isn't "code that performs database queries"? My main concern about the PEP is that it adds overhead to *all* logical operations, but the feature will only benefit code that hasn't yet been written. I also fear that as a result, people will start writing complex if-then blocks to "optimize" performance of conditionals to get them back to where they were before the facility was added. Also, it considerably expands the scope of understanding that someone needs in order to grasp the meaning of a logical expression. For these reasons, I'd feel more comfortable with either a literal syntax (to address algebra, SQL, etc.) or some type of special infix notation to allow new operators to be defined in Python, so that it isn't necessary to use prefix or method notation to perform operations like these. Neither of these solutions burdens applications that don't need the feature(s).
Phillip J. Eby wrote:
My main concern about the PEP is that it adds overhead to *all* logical operations, but the feature will only benefit code that hasn't yet been written.
Actually, there are several packages that implement ugly workarounds for exactly this issue. So, in a sense, there is a significant amount of code that exists that will benefit from this feature. Some that come to mind are my own SQL ADT library, SQLObject, and several parser tools.
For these reasons, I'd feel more comfortable with either a literal syntax (to address algebra, SQL, etc.) or some type of special infix notation to allow new operators to be defined in Python, so that it isn't necessary to use prefix or method notation to perform operations like these. Neither of these solutions burdens applications that don't need the feature(s).
Both of your alternatives are being used in some form and neither is really satisfactory. Literal representations require complex parsers, when the Python parser is really what is desired. The infix notation idea is interesting, however the operators desired are usually 'logical and' and 'logical or', which are clearly spelled 'and' and 'or' in Python. I see it as a semantic limitation that Python does not allow overriding these operators. Adding extra indirection (i.e., extra byte codes) _will_ affect performance, but my view is that correctness and completeness are more important than performance. -Kevin
On Tue, 14 Sep 2004 08:04:45 -0400, Kevin Jacobs <jacobs@theopalgroup.com> wrote:
Phillip J. Eby wrote:
For these reasons, I'd feel more comfortable with either a literal syntax (to address algebra, SQL, etc.) or some type of special infix notation to allow new operators to be defined in Python, so that it isn't necessary to use prefix or method notation to perform operations like these. Neither of these solutions burdens applications that don't need the feature(s).
Both of your alternatives are being used in some form and neither is really satisfactory. Literal representations require complex parsers, when the Python parser is really what is desired. Python's parser is already available, through the compiler module. The example given earlier, query("x and y or z"), is relatively straightforward to implement as a set of AST manipulations. Jp
exarkun@divmod.com wrote:
On Tue, 14 Sep 2004 08:04:45 -0400, Kevin Jacobs <jacobs@theopalgroup.com> wrote:
Phillip J. Eby wrote:
For these reasons, I'd feel more comfortable with either a literal syntax (to address algebra, SQL, etc.) or some type of special infix notation to allow new operators to be defined in Python, so that it isn't necessary to use prefix or method notation to perform operations like these. Neither of these solutions burdens applications that don't need the feature(s).
Both of your alternatives are being used in some form and neither is really satisfactory. Literal representations require complex parsers, when the Python parser is really what is desired.
Python's parser is already available, through the compiler module. The example given earlier, query("x and y or z"), is relatively straightforward to implement as a set of AST manipulations.
While strictly true, your suggestion still requires two distinct parsers (although one implementation) and two distinct parsing contexts (one embedded in a literal string). The use cases I care about involve minimizing the difference between evaluating regular Python expressions and ADT instances -- plus the ability to mix constructs from both in a seamless way. If Python didn't support any over-loadable ADT methods, then this wouldn't be an issue. However, the problem is that virtually all ADT methods _are_ defined _except_ logical conjunction and disjunction. Thus, I am more concerned with correcting this oversight than I am with a fraction of a percent in slowdown in real applications. (or at least micro-benchmarks are _not_ representative of any real world situations I've ever cared about) -Kevin
exarkun@divmod.com:
Python's parser is already available, through the compiler module. The example given earlier, query("x and y or z"), is relatively straightforward to implement as a set of AST manipulations.
But that misses the point, which is to have the expression blend in seamlessly with the rest of the Python code. Anything which requires the explicit invocation of a separate parsing phase prevents that. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
At 08:04 AM 9/14/04 -0400, Kevin Jacobs wrote:
For these reasons, I'd feel more comfortable with either a literal syntax (to address algebra, SQL, etc.) or some type of special infix notation to allow new operators to be defined in Python, so that it isn't necessary to use prefix or method notation to perform operations like these. Neither of these solutions burdens applications that don't need the feature(s).
Both of your alternatives are being used in some form and neither is really satisfactory. Literal representations require complex parsers, when the Python parser is really what is desired.
Maybe you missed the earlier part of the thread, where I was suggesting that a Python "code literal" or "AST literal" syntax would be helpful. For example, if backquotes didn't already have a use, one might say something like: db.query(`x.y==z and foo*bar<27`) To pass an AST object to the db.query() method. The advantage would be that the AST would be parsed and syntax checked at compile time, rather than runtime. After several experiments with using &, |, and ~ for query expressions, I've pretty much quit and gone to using string literals, since AST literals don't exist. But if AST literals *did* exist, I'd certainly use them in preference to strings. But, even if PEP 335 *were* implemented, creating a query system using Python expressions would *still* be kludgy, because you still need "seed variables" in the current scope to write a query expression. In my example above, I didn't need to bind 'x' or 'y' or 'z' or 'foo' or 'bar', because the db.query() method is going to interpret those in some context. If I were using a PEP 335-based query system, I'd have to initialize those variables to special querying objects first. From my POV, the use of &, |, and ~ were very minor issues. Being able to use 'and', 'or', and 'not' would provided some minor syntactic sugar at best. Trying to implement every *other* Python operator correctly, and having to have seed variables is IMO where the bulk of the complexity comes from, when trying to use Python syntax as a query language. That's why I say that an AST literal syntax would be much more useful to me than PEP 335 for this type of use case. As for the numeric use cases, I'm not at all clear why &, |, and ~ (or special methods/functions) aren't suitable.
The infix notation idea is interesting, however the operators desired are usually 'logical and' and 'logical or', which are clearly spelled 'and' and 'or' in Python.
Actually, from a pure functionality perspective, the logical operators are shortcuts for writing if-then-else blocks, and they compile to almost the same bytecode as if-then-else blocks.
I see it as a semantic limitation that Python does not allow overriding these operators.
Python also doesn't allow overriding of 'is' or 'type()' either. I see the logical operators as being rather in the same plane of fundamentals.
Phillip J. Eby wrote: [CHOP]
As for the numeric use cases, I'm not at all clear why &, |, and ~ (or special methods/functions) aren't suitable.
They often are, but sometimes you want a logical and/or/not and &/|/~ are mapped to bitwise and/or/not, which isn't always what you want. Presumably, if Gregs proposal were adopted, and/or/not would get mapped to numarray.logical_and/or/not. What I find more interesting about this proposal is that one could probably finagle it so that (A < B < C) worked correctly for arrays. It can't work now since it is equivalent to ((A < B) and (B < C)) and 'and' doesn't do anything sensible for arrays at present. This is one I always expect to work even though I know that and/or/not don't work for arrays. -tim
What I find more interesting about this proposal is that one could probably finagle it so that (A < B < C) worked correctly for arrays.
Yes. Despite what I said earlier, I've now decided that the new semantics should be extended to A < B < C as well. I'll update the pep & patch at some point to reflect this. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
"Phillip J. Eby" <pje@telecommunity.com>:
So, something like this:
query("x and y or z")
isn't "code that performs database queries"?
Yes, but it's not Python code - it's SQL code wrapped in a string wrapped in Python code. I want just Python code.
My main concern about the PEP is that it adds overhead to *all* logical operations, but the feature will only benefit code that hasn't yet been written.
The overhead shouldn't be substantially worse than that already incurred by all the other operators being overloadable. Also, realistically, how much code do you think has boolean operations as a speed bottleneck? I find it hard to imagine what such code would be like.
I also fear that as a result, people will start writing complex if-then blocks to "optimize" performance of conditionals to get them back to where they were before the facility was added.
If people do that, they're guilty of premature optimisation if they haven't actually measured the speed of their code and found an actual problem with it. I expect such cases will be extremely rare if they occur at all. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
At 01:15 PM 9/15/04 +1200, Greg Ewing wrote:
"Phillip J. Eby" <pje@telecommunity.com>:
So, something like this:
query("x and y or z")
isn't "code that performs database queries"?
Yes, but it's not Python code - it's SQL code wrapped in a string wrapped in Python code. I want just Python code.
But if this were possible: query(``x and y or z``) such that the expression ``x and y or z`` results in a Python AST for that expression, then you'd be able to do whatever you want with it.
My main concern about the PEP is that it adds overhead to *all* logical operations, but the feature will only benefit code that hasn't yet been written.
The overhead shouldn't be substantially worse than that already incurred by all the other operators being overloadable. Also, realistically, how much code do you think has boolean operations as a speed bottleneck? I find it hard to imagine what such code would be like.
So it's acceptable to slow down all logical operations, add new byte codes, and expand the size of the eval loop, all to support a niche usage? That doesn't make sense to me. Again, I'm not familiar with the numeric use cases, but I am familiar with algebraic manipulation of Python code for SQL generation and other purposes, and I honestly don't see any benefit to the PEP for those purposes. AST's are more useful, and I'd support a PEP to make code expressible as literals, because that wouldn't impose overhead on systems that doesn't use them. (For one thing, they could be expressed as constants in code objects, so the bytecode would just be LOAD_CONST.) For the numeric use cases, frankly I don't see why one would want to apply short-circuiting boolean operators to arrays, since presumably the values in them have already been evaluated. And if the idea is to make them *not* be short-circuting operators, that seems to me to corrupt the whole point of the logical operators versus their bitwise counterparts.
"Phillip J. Eby" <pje@telecommunity.com>:
For the numeric use cases, frankly I don't see why one would want to apply short-circuiting boolean operators to arrays, since presumably the values in them have already been evaluated. And if the idea is to make them *not* be short-circuting operators, that seems to me to corrupt the whole point of the logical operators versus their bitwise counterparts.
There's more to it than short-circuiting. Consider a = array([42, ""]) b = array([(), "spam"]) One might reasonably expect the result of 'a or b' to be array([42, "spam"]) which is considerably different from a bitwise operation. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
On Sep 15, 2004, at 12:34 AM, Greg Ewing wrote:
There's more to it than short-circuiting. Consider
a = array([42, ""]) b = array([(), "spam"])
One might reasonably expect the result of 'a or b' to be
array([42, "spam"])
which is considerably different from a bitwise operation.
One might, but *I* would reasonably expect it to give me array a, by extrapolation from every other data type in python. Consider also this: x and 4 or 5 which is of course a common idiom to workaround the lack of an if-then-else expression. So, try with x = array([42, 0]) Currently, doing this with numarray raises an exception "An array doesn't make sense as a truth value. Use sometrue(a) or alltrue(a).". Odd, since nearly all python objects can somehow be turned into a truth value, but ok. [Forbidding __nonzero__ prevents horrible mistakes from occurring because of the misuse of the comparison operators as element-wise comparison. "if array([1,2,3]) == array([3,2,1]): print 'Bad'" of course oughtn't print 'Bad'.] However, with this change, it may instead return: array([4, 5]) and that's nothing like what was meant. The idiom would change to: bool(x) and 4 or 5 I suppose... James PS: Perl6 has distinct element-wise operators ("hyper" operators). I find that less distasteful than misusing regular operators as element-wise operators, when they really have vastly different semantics.
Consider also this: x and 4 or 5 which is of course a common idiom to workaround the lack of an if-then-else expression.
Actually, I hope it isn't common, because it's flawed. It doesn't always work properly even with current Python semantics. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
PS: Perl6 has distinct element-wise operators ("hyper" operators). I find that less distasteful than misusing regular operators as element-wise operators, when they really have vastly different semantics.
There was a huge discussion about that a while back. I don't think anything came of it, though. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
Greg Ewing wrote:
"Phillip J. Eby" <pje@telecommunity.com>:
For the numeric use cases, frankly I don't see why one would want to apply short-circuiting boolean operators to arrays, since presumably the values in them have already been evaluated. And if the idea is to make them *not* be short-circuting operators, that seems to me to corrupt the whole point of the logical operators versus their bitwise counterparts.
There's more to it than short-circuiting. Consider
a = array([42, ""]) b = array([(), "spam"])
One might reasonably expect the result of 'a or b' to be
array([42, "spam"])
which is considerably different from a bitwise operation.
Another example from numarray land. You can pick out subarrays, by indexing with an array of booleans, which can be pretty slick.
import numarray as na a = na.arange(9) a[a < 4] array([0, 1, 2, 3])
You would like a[2 < a < 4] to work, but instead you need:
a[(2 < a) & (a < 4)]
Gregs proposal could fix this. Or suppose you want to find the logical and of a, b. Consider trying to use bitwise ops:
a = na.array([1,1,1,1]) # all true b = na.array([2,2,2,2]) # all true a & b array([0, 0, 0, 0]) # oops, that's why there's logical_and na.logical_and(a,b) array([1, 1, 1, 1], type=Bool) (a!=0) & (b!=0) # this also works, but it does 3x as much work array([1, 1, 1, 1], type=Bool)
Again with Greg's proposal one could write 'a and b' for this. Much nicer. It's not that you couldn't make numarrays short circuit. In the expression "a and b", if all the elements of a are false, then we can skip evaluating b. I'm just not sure that this is a good idea. -tim
At 11:48 PM 9/14/04 -0700, Tim Hochberg wrote:
Again with Greg's proposal one could write 'a and b' for this. Much nicer.
It's not that you couldn't make numarrays short circuit. In the expression "a and b", if all the elements of a are false, then we can skip evaluating b. I'm just not sure that this is a good idea.
My point is that the idea of using 'and' in order to implement something that's *not* short-circuiting seems like a bad idea. I'd rather see array-specific operators added, or some sort of infix notation for functions so that you can define custom operators for such specialized usages.
It's not that you couldn't make numarrays short circuit. In the expression "a and b", if all the elements of a are false, then we can skip evaluating b. I'm just not sure that this is a good idea.
Whether it would be worth it would be application-dependent, i.e. it would only help if pre-scanning all the elements of a were cheaper enough than evaluating b. Probably not a good idea to make it the default behaviour. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
I like the PEP with 'and' and 'or', but isn't the 'not' special method essentially the inverse of __nonzero__?
No, because: (1) __nonzero__ is restricted to returning a boolean result. (2) There are other contexts besides 'not' in which __nonzero__ gets called. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
participants (8)
-
"Martin v. Löwis"
-
exarkun@divmod.com
-
Greg Ewing
-
James Y Knight
-
Kevin Jacobs
-
Michel Pelletier
-
Phillip J. Eby
-
Tim Hochberg