New subject: Are PyObject_RichCompareBool shortcuts part of Python or just CPython quirks?

3 Feb 2020

      ...
PyObject_RichCompareBool(x, y, op) has a (valuable!) shortcut: if x and y are the same object, then equality comparison returns True and inequality False. No attempt is made to execute __eq__ or __ne__ methods in those cases.
This has visible consequences all over the place, but they don't appear to be documented. For example,
...
despite that math.nan == math.nan is False.
It's usually clear which methods will be called, and when, but not really here. Any _context_ that calls PyObject_RichCompareBool() under the covers, for an equality or inequality test, may or may not invoke __eq__ or __ne__, depending on whether the comparands are the same object. Also any context that inlines these special cases to avoid the overhead of calling PyObject_RichCompareBool() at all.
If it's intended that Python-the-language requires this, that needs to be documented.
This has been slowly, but perhaps incompletely documented over the years and has become baked in the some of the collections ABCs as well.  For example, Sequence.__contains__() is defined as:

    def __contains__(self, value):
        for v in self:
            if v is value or v == value:          # note the identity test
                return True
        return False

Various collections need to assume reflexivity, not just for speed, but so that we can reason about them and so that they can maintain internal consistency. For example, MutableSet defines pop() as:

    def pop(self):
        """Return the popped value.  Raise KeyError if empty."""
        it = iter(self)
        try:
            value = next(it)
        except StopIteration:
            raise KeyError from None
        self.discard(value)
        return value

That pop() logic implicitly assumes an invariant between membership and iteration:

       assert(x in collection for x in collection)

We really don't want to pop() a value *x* and then find that *x* is still in the container.   This would happen if iter() found the *x*, but discard() couldn't find the object because the object can't or won't recognize itself:

     s = {float('NaN')}
     s.pop()
     assert not s                  # Do we want the language to guarantee that s is now empty?  I think we must.

The code for clear() depends on pop() working:

    def clear(self):
        """This is slow (creates N new iterators!) but effective."""
        try:
            while True:
                self.pop()
        except KeyError:
            pass

It would unfortunate if clear() could not guarantee a post-condition that the container is empty:

     s = {float('NaN')}
     s.clear()
     assert not s           # Can this be allowed to fail?

The case of count() is less clear-cut, but even there identity-implies-equality improves our ability to reason about code:  Given some list, *s*, possibly already populated, would you want the following code to always work:

     c = s.count(x)
     s.append(x)
     assert s.count(x) == c + 1         # To me, this is fundamental to what the word "count" means.

I can't find it now, but remember a possibly related discussion where we collectively rejected a proposal for an __is__() method.  IIRC, the reasoning was that our ability to think about code correctly depended on this being true:

    a = b
    assert a is b

Back to the discussion at hand, I had thought our position was roughly:

* __eq__ can return anything it wants.

* Containers are allowed but not required to assume that identity-implies-equality.

* Python's core containers make that assumption so that we can keep
  the containers internally consistent and so that we can reason about
  the results of operations.

Also, I believe that even very early dict code (at least as far back as Py 1.5.2) had logic for "v is value or v == value".

As far as NaNs go, the only question is how far to propagate their notion of irreflexivity. Should "x == x" return False for them? We've decided yes.  When it comes to containers, who makes the rules, the containers or their elements.  Mostly, we let the elements rule, but containers are allowed to make useful assumptions about the elements when necessary.  This isn't much different than the rules for the "==" operator where __eq__() can return whatever it wants, but functions are still allowed to write "if x == y: ..." and assumes that meaningful boolean value has been returned (even if it wasn't).  Likewise, the rule for "<" is that it can return whatever it wants, but sorted() and min() are allowed to assume a meaningful total ordering (which might or might not be true).  In other words, containers and functions are allowed, when necessary or useful, to override the decisions made by their data.   This seems like a reasonable state of affairs.

The current docs make an effort to describe what we have now: https://docs.python.org/3/reference/expressions.html#value-comparisons 

Sorry for the lack of concision.  I'm posting on borrowed time,

Raymond

Re: Are PyObject_RichCompareBool shortcuts part of Python or just CPython quirks?

Raymond Hettinger

Serhiy Storchaka

Guido van Rossum

Tim Peters

Sebastian Berg

Larry Hastings

Sebastian Berg

Steven D'Aprano

Sebastian Berg

Chris Angelico

Steven D'Aprano

Chris Angelico

Glenn Linderman

tags

participants (9)