[Python-ideas] Break the dominance of boolean values in boolean context

Wed Sep 14 01:55:42 CEST 2011

On Tue, Sep 13, 2011 at 3:26 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Wed, Sep 14, 2011 at 3:56 AM, Guido van Rossum <guido at python.org> wrote:
>> On Mon, Sep 12, 2011 at 9:51 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>> The two issues seem somewhat orthogonal to me, but yes, the general
>>> idea would be to make 'in' behave more like a rich comparison operator
>>> rather than an explicitly boolean operation as it does now.
>>>
>>> It occurs to me that adding __in__ could also address a slight
>>> performance oddity with 3.2 range objects: the __contains__ check in
>>> 3.2 has an O(1) fast path for containment checks on actual Python
>>> integers, but falls back to the O(n) sequential search for objects
>>> that only implement __index__(). If __in__() was available, such
>>> objects could conceivably map containment tests to checks against the
>>> corresponding real Python integer (although doing that carelessly
>>> would do weird things to other containers, such as identity-keyed
>>> dictionaries. That would be a quality of implementation issue on
>>> __in__ methods, though).
>>
>> I guess I might understand this paragraph if you pointed me to the code. :-(
>
> http://hg.python.org/cpython/file/default/Objects/rangeobject.c#l603
> (scroll up a bit from where that link lands for the definition of the
> O(1) check)

Gotcha. I missed that this was just about range objects. :-)

> For the bit about breaking identity-keyed dictionaries, consider a
> hypothetical naive containment implementation like this:
>
>  def __in__(self, container):
>    return int(self) in container
>
> That code involves an implicit assumption that *all* containers are
> equivalence based, and that simply isn't true - it is sometimes useful
> to create a container that relies on object identity rather than value
> (e.g. to store additional metadata about arbitrary objects, even
> mutable ones). So a more correct implementation would have to look
> something like:
>
>  def __in__(self, container):
>    if isinstance(container, range): # We know range containment is equivalence based
>      return int(self) in container
>    return NotImplemented

Hm, this reminds me of the thorny issue about 'is' implying '==' for
certain container checks... But really, there are many other ways of
making similar mistakes with other operators, and IMO a good __in__
method should always check if the RHS type is a known and supported
type, and otherwise return NotImplemented like a good binary operator
should.

> One additional complication that Alex Gaynor pointed out is that there
> are potentially *two* distinct operations that would need to be
> covered if "in" were to become a rich comparison operation - 'in' and
> 'not in'. Currently "not x in y" and "x not in y" generate identical
> bytecode, and the control flow optimisations that apply to the former
> can also be applied to the latter. Making 'in' a rich comparison
> operation would thus require choosing one of the following behaviours:

>  - making 'not' itself avoid coercing to a boolean value (which
> you've already said you don't want to do)

Right.

>  - retaining the equivalence between "x not in y" and "not x in y"
> and simply accepting that "x in y" is a rich comparison while "x not
> in y" is not

I think this one is fine, actually.

>  - adding two additional special methods to cover the 'not in' case,
> defining a method precedence that covers the various combinations of
> methods on the operands for both 'in' and 'not in' and disentangling
> all the parts of the code generator that assume the equivalence of "x
> not in y" and "not x in y"

Basically this would elevate 'not in' to a separate operator, just
like '==' and '!=' are separate operators. Now, *if* we were to adopt
PEP 335, this would be a reasonable approach. But since we're not, I
think it's fine. If NumPy ever implements 'in' as returning an array,
NumPy users might have to be warned that 'not in' doesn't work that
way, and 'not (x in y)' doesn't work either. So they'll have to write
something like '1 - (x in y)', assuming 'x in y' returns an array of
bools. Big deal.

> None of those options sound particularly appealing. A rich 'not' would
> probably be the cleanest solution, but you've already given valid
> reasons for not wanting to do that.

And I stick to them.

>>> To be honest, I don't think anyone would cry too much if you decided
>>> to explicitly reject it on the basis of continuing to allow control
>>> flow optimisations for code involving not/and/or. While CPython
>>> doesn't do it, I believe there *are* control flow transformations that
>>> the current semantics permit that PEP 335 would disallow, such as
>>> automatically applying De Morgan's Law (I don't actually have a use
>>> case for doing that, I'm just mentioning it as a consequence of the
>>> semantics change proposed by the PEP).
>>
>> I think I just mentioned one (turning 'if not' into a jump). Anyway,
>> I'm glad to reject the PEP for the reason that I like the status quo
>> just fine. (But relaxing __contains__ and adding __in__ as its reverse
>> have my blessing.) Also, after reading the PEP from beginning to end,
>> and downloading and skimming the patch (but failing to actually
>> compile a patched version of Python 2.3), I think the offered API is
>> too complicated to be of much use. Certainly the NumPy folks have
>> repeatedly claimed that they are fine with the status quo.
>
> OK, I'll add a rejection notice to PEP 335 with a link to this thread.
> Given the point above regarding the problems with "x not in y" vs "not
> x in y", do you want me to include something saying that rich
> containment checks are also rejected?

Let's mull that one over for a bit longer. It's not mentioned in PEP 335, is it?

-- 
--Guido van Rossum (python.org/~guido)