[Python-ideas] Break the dominance of boolean values in boolean context

Mon Sep 12 22:20:11 CEST 2011

Hi,

the proposal is to reduce the number of occurences where Python
automatically and inevitably uses pure boolean objects (bool()),
instead of relying on rich object behaviour. My goal is to give
objects finer control over how they want to be interpreted in a
boolean context. This reaches down as far as implementing the boolean
operations as an object-protocol.

In Python, no kind of object is in any way special. Instead of relying
on what an object actually IS, we rely on what an object can DO. This
approach of anti-discrimination is the source of great confidence
about the language and it's aversion to black magic. The bool()-type
however is a primus inter pares: In numerous locations, Python
strongly favors this type of object either explicitly by casting to it
or implicitly through the compiled code.

Here are a three examples (protocol-, object- and code-based) where
Python grinds a boolean context down to pure boolean values:

In protocol behaviour: The documentation about how to implement
__contains__ vaguely states that the function "should return true,
false otherwise". In fact we may return any object that can be
interpreted as a boolean, this is intented behaviour. However the
implementation of COMPARE_OP in Python/ceval.c inevitably casts
the result of "x in y" to Py_True or Py_False, throwing away our object
and leaving us with this bool only.
This kills any chance of producing useful objects in __contains__,
even objects that may have a meaning outside the pure boolean context.
For example:

x = mycontainer(['foo', 'bar', 'bar', 'foo', 'foo'])
y = 'foo' in mycontainer
print y == True
>> True
print y
>> mycontainer('foo', 'foo', 'foo')

This has been discussed on python-ideas in mid-2010
(http://mail.python.org/pipermail/python-ideas/2010-July/007733.html)
without distinct outcome.

In object behaviour: Rich Comparision allows numeric- or set-like
comparision in any way we like, returing any kind of object. The
documentation clearly states that the returned object is only(!)
interpreted as a boolean when used in a boolean context (__contains__
falls short here).
For example:

x = set((1,2,3,4,5))
y = set((2,3,4))
z = x > y
print z == True
>> True
print z
>> set([1,5]) # equivalent to x - y

z = x < y
print z == True
>> False
print z
>> set([]) # equivalent to y - x

print 3 > 2
>> 1
print (3 > 2) == True
>> True

In code behaviour: Why is it, that we can override the behaviour of
arithmetic operations like "+, -, *, /", but not the boolean
operations like "and, or, xor, not" ? Why do we force any value being
generated in a boolean context like "if x == 1 or y == 1" to actually
be of boolean type? The result of "(x/y) == 1" may be any kind of
object coming from __eq__. However the "or"-operator here grinds them
both down to being just True or False. This is due to the fact that
the generated bytecode evaluates one expression at a time and does not
keep track of the objects it got while doing so. The pure boolean
behaviour arises from the use of JUMP_IF_FALSE/TRUE after each
comparision.
Instead of having boolean operations being a part of the language, we
could implement them as a an extension to the Rich Comparision
Protocol, giving rise to functions like object.__bor__,
object.__bnot__ or object.__band__:

The expression "if x == 1 or y == 1" would then become equivalent to

tmp1 = x.__eq__(1)
if tmp1:
 return tmp1
tmp2 = y.__eq__(1)
if tmp2:
 return tmp2
return = tmp1.__bor__(tmp2)

Likewise, the expression "if x == 1 and y == 1" would become

tmp1 = x.__eq__(1)
if not tmp1:
 return tmp1
tmp2 = y.__eq__(1)
if not tmp2:
 return tmp2
return tmp1.__band__(tmp2)

The object-example from above now tells us how boolean behaviour and
arithmetic behaviour go hand in hand: "(setA > setB) or (setA < setB)"
is True because "set([1,5]).__bor__(set([]))" is the same as
"set([1,5]) + set([])" and equivalent to True in a boolean context.
Likewise, "(setA > setB) and (setA < setB)" is False because
"set([1,5]).__band__set([])" is just "set([])". It follows that "(setA
> setB) == (setA - setB) == (setA & setB)". We can't do this with
boolean operations being part of the language only.

Summing all up, I really think that we should break the dominance of
bool() and take a look at how we can implement boolean contexts
without relying on boolean values all the time.

None of this can be implemented without breaking at least the CPython
API. For example, the behaviour of __contains__ can't be changed in
the proposed way without changing the signature of "int
PySequence_Contains()" to "PyObject* PySequence_Contains()".