[Python-Dev] == on object tests identity in 3.x

Fri Jul 11 16:04:35 CEST 2014

Am 09.07.2014 03:48, schrieb Raymond Hettinger:
>
> On Jul 7, 2014, at 4:37 PM, Andreas Maier <andreas.r.maier at gmx.de> wrote:
>
>> I do not really buy into the arguments that try to show how identity and value are somehow the same. They are not, not even in Python.
>>
>> The argument I can absolutely buy into is that the implementation cannot be changed within a major release. So the real question is how we document it.
>
> Once every few years, someone discovers IEEE-754, learns that NaNs
> aren't supposed to be equal to themselves and becomes inspired
> to open an old debate about whether the wreck Python in a effort
> to make the world safe for NaNs.  And somewhere along the way,
> people forget that practicality beats purity.
>
> Here are a few thoughts on the subject that may or may not add
> a little clarity ;-)
>
> * Python already has IEEE-754 compliant NaNs:
>
>         assert float('NaN') != float('NaN')
>
> * Python already has the ability to filter-out NaNs:
>
>         [x for x in container if not math.nan(x)]
>
> * In the numeric world, the most common use of NaNs is for
>    missing data (much like we usually use None).  The property
>    of not being equality to itself is primarily useful in
>    low level code optimized to run a calculation to completion
>    without running frequent checks for invalid results
>    (much like @n/a is used in MS Excel).
>
> * Python also lets containers establish their own invariants
>    to establish correctness, improve performance, and make it
>    possible to reason about our programs:
>
>             for x in c:
> 	       assert x in c
>
> * Containers like dicts and sets have always used the rule
>    that identity-implies equality.  That is central to their
>    implementation.  In particular, the check of interned
>    string keys relies on identity to bypass a slow
>    character-by-character comparison to verify equality.
>
> * Traditionally, a relation R is considered an equality
>    relation if it is reflexive, symmetric, and transitive:
>
>        R(x, x) -> True
>        R(x, y) -> R(y, x)
>        R(x, y) ^ R(y, z) -> R(x, z)
>
> * Knowingly or not, programs tend to assume that all of those
>    hold.  Test suites in particular assume that if you put
>    something in a container that assertIn() will pass.
>
> * Here are some examples of cases where non-reflexive objects
>    would jeopardize the pragmatism of being able to reason
>    about the correctness of programs:
>
>        s = SomeSet()
>        s.add(x)
>        assert x in s
>
>        s.remove(x)        # See collections.abc.Set.remove
>        assert not s
>
>        s.clear()          # See collections.abc.Set.clear
>        asset not s
>
> * What the above code does is up to the implementer of the
>    container.  If you use the Set ABC, you can choose to
>    implement __contains__() and discard() to use straight
>    equality or identity-implies equality.  Nothing prevents
>    you from making containers that are hard to reason about.
>
> * The builtin containers make the choice for identity-implies
>    equality so that it is easier to build fast, correct code.
>    For the most part, this has worked out great (dictionaries
>    in particular have had identify checks built-in from almost
>    twenty years).
>
> * Years ago, there was a debate about whether to add an __is__()
>    method to allow overriding the is-operator.  The push for the
>    change was the "pure" notion that "all operators should be
>    customizable".  However, the idea was rejected based on the
>    "practical" notions that it would wreck our ability to reason
>    about code, it slow down all code that used identity checks,
>    that library modules (ours and third-party) already made
>    deep assumptions about what "is" means, and that people would
>    shoot themselves in the foot with hard to find bugs.
>
> Personally, I see no need to make the same mistake by removing
> the identity-implies-equality rule from the built-in containers.
> There's no need to upset the apple cart for nearly zero benefit.

Containers delegate the equal comparison on the container to their 
elements; they do not apply identity-based comparison to their elements. 
At least that is the externally visible behavior.

Only the default comparison behavior implemented on type object follows 
the identity-implies-equality rule.

As part of my doc patch, I will upload an extension to the 
test_compare.py test suite, which tests all built-in containers with 
values whose order differs the identity order, and it shows that the 
value order and equality wins over identity, if implemented.

>
> IMO, the proposed quest for purity is misguided.
> There are many practical reasons to let the builtin
> containers continue work as the do now.

As I said, I can accept compatibility reasons. Plus, the argument 
brought up by Benjamin about the desire for the the 
identity-implies-equality rule as a default, with no corresponding rule 
for order comparison (and I added both to the doc patch).

Andy