[Python-ideas] Break the dominance of boolean values in boolean context

Wed Sep 14 00:26:14 CEST 2011

On Wed, Sep 14, 2011 at 3:56 AM, Guido van Rossum <guido at python.org> wrote:
> On Mon, Sep 12, 2011 at 9:51 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> The two issues seem somewhat orthogonal to me, but yes, the general
>> idea would be to make 'in' behave more like a rich comparison operator
>> rather than an explicitly boolean operation as it does now.
>>
>> It occurs to me that adding __in__ could also address a slight
>> performance oddity with 3.2 range objects: the __contains__ check in
>> 3.2 has an O(1) fast path for containment checks on actual Python
>> integers, but falls back to the O(n) sequential search for objects
>> that only implement __index__(). If __in__() was available, such
>> objects could conceivably map containment tests to checks against the
>> corresponding real Python integer (although doing that carelessly
>> would do weird things to other containers, such as identity-keyed
>> dictionaries. That would be a quality of implementation issue on
>> __in__ methods, though).
>
> I guess I might understand this paragraph if you pointed me to the code. :-(

http://hg.python.org/cpython/file/default/Objects/rangeobject.c#l603
(scroll up a bit from where that link lands for the definition of the
O(1) check)

For the bit about breaking identity-keyed dictionaries, consider a
hypothetical naive containment implementation like this:

  def __in__(self, container):
    return int(self) in container

That code involves an implicit assumption that *all* containers are
equivalence based, and that simply isn't true - it is sometimes useful
to create a container that relies on object identity rather than value
(e.g. to store additional metadata about arbitrary objects, even
mutable ones). So a more correct implementation would have to look
something like:

  def __in__(self, container):
    if isinstance(container, range): # We know range containment is
equivalence based
      return int(self) in container
    return NotImplemented

One additional complication that Alex Gaynor pointed out is that there
are potentially *two* distinct operations that would need to be
covered if "in" were to become a rich comparison operation - 'in' and
'not in'. Currently "not x in y" and "x not in y" generate identical
bytecode, and the control flow optimisations that apply to the former
can also be applied to the latter. Making 'in' a rich comparison
operation would thus require choosing one of the following behaviours:
  - making 'not' itself avoid coercing to a boolean value (which
you've already said you don't want to do)
  - retaining the equivalence between "x not in y" and "not x in y"
and simply accepting that "x in y" is a rich comparison while "x not
in y" is not
  - adding two additional special methods to cover the 'not in' case,
defining a method precedence that covers the various combinations of
methods on the operands for both 'in' and 'not in' and disentangling
all the parts of the code generator that assume the equivalence of "x
not in y" and "not x in y"

None of those options sound particularly appealing. A rich 'not' would
probably be the cleanest solution, but you've already given valid
reasons for not wanting to do that.

>> To be honest, I don't think anyone would cry too much if you decided
>> to explicitly reject it on the basis of continuing to allow control
>> flow optimisations for code involving not/and/or. While CPython
>> doesn't do it, I believe there *are* control flow transformations that
>> the current semantics permit that PEP 335 would disallow, such as
>> automatically applying De Morgan's Law (I don't actually have a use
>> case for doing that, I'm just mentioning it as a consequence of the
>> semantics change proposed by the PEP).
>
> I think I just mentioned one (turning 'if not' into a jump). Anyway,
> I'm glad to reject the PEP for the reason that I like the status quo
> just fine. (But relaxing __contains__ and adding __in__ as its reverse
> have my blessing.) Also, after reading the PEP from beginning to end,
> and downloading and skimming the patch (but failing to actually
> compile a patched version of Python 2.3), I think the offered API is
> too complicated to be of much use. Certainly the NumPy folks have
> repeatedly claimed that they are fine with the status quo.

OK, I'll add a rejection notice to PEP 335 with a link to this thread.
Given the point above regarding the problems with "x not in y" vs "not
x in y", do you want me to include something saying that rich
containment checks are also rejected?

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia