[Python-ideas] checking for identity before comparing built-in objects

Tue Oct 9 10:22:47 CEST 2012

On Tue, Oct 9, 2012 at 12:43 PM, Guido van Rossum <guido at python.org> wrote:
> No, that's not what I meant -- maybe my turn of phrase "invoking IEEE"
> was confusing. The first part is what I meant: "Python cannot have a
> rule that x is y implies x == y because that would preclude
> implementing float.__eq__ as IEEE 754 equality comparison." The second
> half should be: "And we have already (independently from all this)
> decided that we want to implement float.__eq__ as IEEE 754 equality
> comparison." I'm sure a logician could rearrange the words a bit and
> make it look more logical.

I'll have a go. It's a lot longer, though :)

When designing their floating point support, language designers must
choose between two mutually exclusive options:
1. IEEE754 compliant floating point comparison where NaN != NaN, *even
if* they're the same object
2. The invariant that "x is y" implies "x == y"

The idea behind following the IEEE754 model is that mathematics is a
*value based system*. There is only really one NaN, just as there is
only one 4 (or 5, or any other specific value). The idea of a number
having an identity distinct from its value simply doesn't exist. Thus,
when modelling mathematics in an object system, it makes sense to say
that *object identity is irrelevant, and only value matters*.

This is the approach Python has chosen: for *numeric* operations,
including comparisons, object identity is irrelevant to the maximum
extent that is practical. Thus "x = float('nan'); assert x != x" holds
for *exactly the same reason* that "x = 10e50; y = 10e50; assert x ==
y" holds.

However, when it comes to containers, being able to assume that "x is
y" implies "x == y" has an immense practical benefit in terms of being
able to implement a large number of non-trivial optimisations. Thus
the Python language definition explicitly allows containers to make
that assumption, *even though it is known not to be universally true*.

This hybrid model means that even though "'x is y' implies 'x == y'"
is not true in the general case, it may still be *assumed to be true*
regardless by container implementations. In particular, the containers
defined in the standard library reference are *required* to make this
assumption.

This does mean that certain invariants about containers don't hold in
the presence of NaN values. This is mostly a theoretical concern, but,
in those cases where it *does* matter, then the appropriate solution
is to implement a custom container type that handles NaN values
correctly.

It's perhaps worth including a section explaining this somewhere in
the language reference. It's not an accident that Python behaves the
way it does, but it's certainly a rationale that can help implementors
correctly interpret the rest of the language spec.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia