[Python-Dev] PyObject_RichCompareBool identity shortcut

Thu Apr 28 12:11:16 CEST 2011

On Thu, Apr 28, 2011 at 6:30 PM, Alexander Belopolsky
<alexander.belopolsky at gmail.com> wrote:
> On Thu, Apr 28, 2011 at 3:57 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> ..
>>> It is an interesting question of what "sane invariants" are.  Why you
>>> consider the invariants that you listed essential while say
>>>
>>> if c1 == c2:
>>>   assert all(x == y for x,y in zip(c1, c2))
>>>
>>> optional?
>>
>> Because this assertion is an assertion about the behaviour of
>> comparisons that violates IEEE754, while the assertions I list are all
>> assertions about the behaviour of containers that can be made true
>> *regardless* of IEEE754 by checking identity explicitly.
>>
>
> AFAIK, IEEE754 says nothing about comparison of containers, so my
> invariant cannot violate it.  What you probably wanted to say is that
> my invariant cannot be achieved in the presence of IEEE754 conforming
> floats, but this observation by itself does not make my invariant less
> important than yours.  It just makes yours easier to maintain.

No, I meant what I said. Your assertion includes a direct comparison
between values (the "x == y" part) which means that IEEE754 has a
bearing on whether or not it is a valid assertion. Every single one of
my stated invariants consists solely of relationships between
containers, or between a container and its contents. This keeps them
all out of the domain of IEEE754 since the *container implementers*
get to decide whether or not to factor object identity into the
management of the container contents.

The core containment invariant is really only this one:

    for x in c:
        assert x in c

That is, if we iterate over a container, all entries returned should
be in the container. Hopefully it is non-controversial that this is a
sane and reasonable invariant for a container *user* to expect.

The comparison invariants follow from the definition of set equivalence as:

  set1 == set2 iff all(x in set2 for x in set1) and all(y in set1 for y in set2)

Again, notice that there is no comparison of items here - merely a
consideration of the way items relate to containers.

The rationale behind the count() and index() assertions is harder to
define in implementation neutral terms, but their behaviour does
follow naturally from the internal enforcement of reflexivity needed
to guarantee that core invariant.

In mathematics, this is all quite straightforward and
non-controversial, since it can be taken for granted that equality is
reflexive (as it's part of the definition of what equality *means* -
equivalence relations *are* relations that are symmetric, transitive
and reflexive. Lose any one of those three properties and it isn't an
equivalence relation any more).

However, when we confront the practical reality of IEEE754 floating
point values and the lack of reflexivity in the presence of NaN, we're
faced with a choice of (at least) 4 alternatives:

1. Deny it. Say equality is reflexive at the language level, and we
don't care that it makes it impossible to fully implement IEEE754
semantics. This is what Eiffel does, and if you don't care about
interoperability and the possibility of algorithmic equivalence with
hardware implementations, it's probably not a bad idea. After all, why
discard centuries of mathematical experience based on a decision that
the IEEE754 committee can't clearly recall the rationale for, and
didn't clearly document?

2. Tolerate it, but attempt to confine the breakage of mathematical
guarantees to the arithmetic operations actually covered by the
relevant standards. This is what CPython currently does by enforcing
the container invariants at an implementation level, and, as I think
it's a good way to handle the situation, this is what I am advocating
lifting up to the language level through appropriate updates to the
library and language reference. (Note that even changing the behaviour
of float() leaves Python in this situation, since third party types
will still be free to follow IEEE754. Given that, it seems relatively
pointless to change the behaviour of builtin floats after all the
effort that has gone into bringing them ever closer to IEEE754).

3. Signal it. We already do this in some cases (e.g. for
ZeroDivisionError), and I'm personally quite happy with the idea of
raising ValueError in other cases, such as when attempting to perform
ordering comparisons on NaN values.

4. Embrace it. Promote NaN to a language level construct, define
semantics allowing it to propagate through assorted comparison and
other operations (including short-circuiting logic operators) without
being coerced to True as it is now.

Documenting the status quo is the *only* necessary step in all of this
(and Raymond has already adopted the relevant tracker issue). There
are tweaks to the current semantics that may be useful (specifically
ValueError when attempting to order NaN), but changing the meaning of
equality for floats probably isn't one of them (since that only fixes
one type, while fixing the affected algorithms fixes *all* types).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia