[Python-Dev] PyObject_RichCompareBool identity shortcut

Thu Apr 28 09:27:26 CEST 2011

On 4/27/2011 11:54 PM, Nick Coghlan wrote:
> On Thu, Apr 28, 2011 at 4:20 PM, Glenn Linderman<v+python at g.nevcal.com>  wrote:
>> In that bug, Nick, you mention that reflexive equality is something that
>> container classes rely on in their implementation.  Such reliance seems to
>> me to be a bug, or an inappropriate optimization, rather than a necessity.
>> I realize that classes that do not define equality use identity as their
>> default equality operator, and that is acceptable for items that do not or
>> cannot have any better equality operator.  It does lead to the situation
>> where two objects that are bit-for-bit clones get separate entries in a
>> set... exactly the same as how NaNs of different identity work... the
>> situation with a NaN of the same identity not being added to the set
>> multiple times seems to simply be a bug because of conflating identity and
>> equality, and should not be relied on in container implementations.
> No, as Raymond has articulated a number of times over the years, it's
> a property of the equivalence relation that is needed in order to
> present sane invariants to users of the container.

I probably wasn't around when Raymond did his articulation :)  Sorry for 
whatever amount of rehashing I'm doing here -- pointers to some of the 
articulation would be welcome, but perhaps the summary below is intended 
to recap the results of such discussions.  If my comments below seem to 
be grasping the essence of those discussions, then no need for the 
pointers... if I'm way off, I'd like to read a thread or two.

> I included in the
> bug report the critical invariants I am currently aware of that should
> hold, even when the container may hold types with a non-reflexive
> definition of equality:
>
>    assert [x] == [x]                     # Generalised to all container types
>    assert not [x] != [x]                # Generalised to all container types
>    for x in c:
>      assert x in c
>      assert c.count(x)>  0                   # If applicable
>      assert 0<= c.index(x)<  len(c)      # If applicable
>
> The builtin types all already work this way, and that's a deliberate
> choice - my proposal is simply to document the behaviour as
> intentional, and fix the one case I know of in the standard library
> where we don't implement these semantics correctly (i.e.
> collections.Sequence).
>
> The question of whether or not float and decimal.Decimal should be
> modified to have reflexive definitions of equality (even for NaN
> values) is actually orthogonal to the question of clarifying and
> documenting the expected semantics of containers in the face of
> non-reflexive definitions of equality.

Yes, I agree they are orthogonal questions... separate answers and 
choices can be made for specific classes, just like some classes 
implement equality using identity, it would also be possible to 
implement identity using equality, and it is possible to conflate the 
two as has apparently been deliberately done for Python containers, 
without reflecting that in the documentation.

If the containers have been deliberately implemented in that way, and it 
is not appropriate to change them, then more work is needed in the 
documentation than just your proposed Glossary definition, as the very 
intuitive descriptions in the Comparisons section are quite at odds with 
the current implementation.

Without having read the original articulations by Raymond or any 
discussions of the pros and cons, it would appear that the above list of 
invariants, which you refer to as "sane", are derived from a "pre-NaN" 
or "reflexive equality" perspective; while some folk perhaps think the 
concept of NaN is a particular brand of insanity, it is a standard 
brand, and therefore worthy of understanding and discussion.  And 
clearly, if the NaN perspective is intentionally corralled in Python, 
then the documentation needs to be clarified.  On the other hand, the 
SQL language has embraced the same concept as NaN in its concept of 
NULL, and has pushed that concept (they call it three-valued logic, I 
think) clear through the language.  NULL == NULL is not True, and it is 
not False, but it is NULL.  Of course, the language is different in 
other ways that Python; values are not objects and have no identity, but 
they do have collections of values called tuples, columns, and tables, 
which are similar to lists and lists of lists.  And they have mappings 
called indexes.  And they've made it all work with the concept of NULL 
and three-valued logic.  And sane people work with database systems 
built around such concepts.  So I guess I reject the argument that the 
above invariants are required for sanity.

On the other hand, having not much Python internals knowledge as yet, 
I'm in no position to know how seriously things would break internally 
should a different set of invariants that embrace and extend the concept 
of non-reflexive equality were to be invented to replace the above, nor 
whether there is a compatible migration path to achieve it in a 
reasonable manner... from future import NaNsanity ... :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110428/92a01b0b/attachment.html>