[Python-Dev] PyObject_RichCompareBool identity shortcut

Nick Coghlan ncoghlan at gmail.com
Fri Apr 29 01:40:40 CEST 2011


On Fri, Apr 29, 2011 at 9:13 AM, Guido van Rossum <guido at python.org> wrote:
>> I hadn't really thought about it that way before this discussion - it
>> is the identity checking behaviour of the builtin containers that lets
>> us sensibly handle cases like sets of NumPy arrays.
>
> But do they? For non-empty arrays, __eq__ will always return something
> that is considered true, so any hash collisions will cause false
> positives. And look at this simple example:
>
>>>> class C(list):
> ...   def __eq__(self, other):
> ...     if isinstance(other, C):
> ...       return [x == y for x, y in zip(self, other)]
> ...
>>>> a = C([1,2,3])
>>>> b = C([2,1,3])
>>>> a == b
> [False, False, True]
>>>> x = [a, a]
>>>> b in x
> True

Hmm, true. And things like count() and index() would still be
thoroughly broken for sequences. OK, so scratch that idea - there's
simply no sane way to handle such objects without using an
identity-based container that ignores equality definitions altogether.

Pondering the NaN problem further, I think we can relatively easily
argue that reflexive behaviour at the object level fits within the
scope of IEEE754.

1. IEEE754 is a value-based system, with a finite number of distinct
NaN payloads
2. Python is an object-based system. In addition to their payload, NaN
objects are further distinguished by their identity (infinite in
theory, in practice limited by available memory).
3. We can still technically be conformant with IEEE754 even if we say
that a given NaN object is equivalent to itself, but not to other NaN
objects with the same payload.

Unfortunately, this still doesn't play well with serialisation, which
assumes that the identity of float objects doesn't matter:

>>> import pickle
>>> nan = float('nan')
>>> x = [nan, nan]
>>> x[0] is x[1]
True
>>> y = pickle.loads(pickle.dumps(x))
>>> y
[nan, nan]
>>> y[0] is y[1]
False

Contrast that with the handling of lists, where identity is known to
be significant:

>>> x = [[]]*2
>>> x[0] is x[1]
True
>>> y = pickle.loads(pickle.dumps(x))
>>> y
[[], []]
>>> y[0] is y[1]
True

I'd say I've definitely come around to being +0 on the idea of making
the float() and decimal.Decimal() __eq__ definitions reflexive, but
doing so does have implications when it comes to the ability to
accurately save and restore application state. It isn't as simple as
just adding "if self is other: return True" to the respective __eq__
implementations.

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list