[Python-Dev] == on object tests identity in 3.x
raymond.hettinger at gmail.com
Wed Jul 9 03:48:17 CEST 2014
On Jul 7, 2014, at 4:37 PM, Andreas Maier <andreas.r.maier at gmx.de> wrote:
> I do not really buy into the arguments that try to show how identity and value are somehow the same. They are not, not even in Python.
> The argument I can absolutely buy into is that the implementation cannot be changed within a major release. So the real question is how we document it.
Once every few years, someone discovers IEEE-754, learns that NaNs
aren't supposed to be equal to themselves and becomes inspired
to open an old debate about whether the wreck Python in a effort
to make the world safe for NaNs. And somewhere along the way,
people forget that practicality beats purity.
Here are a few thoughts on the subject that may or may not add
a little clarity ;-)
* Python already has IEEE-754 compliant NaNs:
assert float('NaN') != float('NaN')
* Python already has the ability to filter-out NaNs:
[x for x in container if not math.nan(x)]
* In the numeric world, the most common use of NaNs is for
missing data (much like we usually use None). The property
of not being equality to itself is primarily useful in
low level code optimized to run a calculation to completion
without running frequent checks for invalid results
(much like @n/a is used in MS Excel).
* Python also lets containers establish their own invariants
to establish correctness, improve performance, and make it
possible to reason about our programs:
for x in c:
assert x in c
* Containers like dicts and sets have always used the rule
that identity-implies equality. That is central to their
implementation. In particular, the check of interned
string keys relies on identity to bypass a slow
character-by-character comparison to verify equality.
* Traditionally, a relation R is considered an equality
relation if it is reflexive, symmetric, and transitive:
R(x, x) -> True
R(x, y) -> R(y, x)
R(x, y) ^ R(y, z) -> R(x, z)
* Knowingly or not, programs tend to assume that all of those
hold. Test suites in particular assume that if you put
something in a container that assertIn() will pass.
* Here are some examples of cases where non-reflexive objects
would jeopardize the pragmatism of being able to reason
about the correctness of programs:
s = SomeSet()
assert x in s
s.remove(x) # See collections.abc.Set.remove
assert not s
s.clear() # See collections.abc.Set.clear
asset not s
* What the above code does is up to the implementer of the
container. If you use the Set ABC, you can choose to
implement __contains__() and discard() to use straight
equality or identity-implies equality. Nothing prevents
you from making containers that are hard to reason about.
* The builtin containers make the choice for identity-implies
equality so that it is easier to build fast, correct code.
For the most part, this has worked out great (dictionaries
in particular have had identify checks built-in from almost
* Years ago, there was a debate about whether to add an __is__()
method to allow overriding the is-operator. The push for the
change was the "pure" notion that "all operators should be
customizable". However, the idea was rejected based on the
"practical" notions that it would wreck our ability to reason
about code, it slow down all code that used identity checks,
that library modules (ours and third-party) already made
deep assumptions about what "is" means, and that people would
shoot themselves in the foot with hard to find bugs.
Personally, I see no need to make the same mistake by removing
the identity-implies-equality rule from the built-in containers.
There's no need to upset the apple cart for nearly zero benefit.
IMO, the proposed quest for purity is misguided.
There are many practical reasons to let the builtin
containers continue work as the do now.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-Dev