[Python-Dev] Fighting the theoretical randomness of "is" on immutables

Tue May 7 16:33:07 CEST 2013

On Tue, May 7, 2013 at 6:27 PM, Armin Rigo <arigo at tunes.org> wrote:
> Hi Antoine,
>
> On Tue, May 7, 2013 at 8:25 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> For me, a patch that mandated general-purpose containers (list, dict,
>> etc.) respect object identity would be ok.
>
> Thanks, that's also my opinion.
>
> In PyPy's approach, in trying to emulate CPython vs. trying to
> convince users that "is" is sometimes a bad idea, we might eventually
> end up at the extreme side, which can be seen as where CPython would
> be if it cached *all* ints, longs, floats, complexes, strings,
> unicodes and tuples.
>
> The original question in this thread was about if it's ok for two
> objects x and y to satisfy "x is y" while at the same time "id(x) !=
> id(y)".  I think by now that it would only create more confusion (even
> if only in some very special cases).  We'll continue to maintain the
> invariant then, and if it requires creating extremely large values for
> id(), too bad. (1)

Yeah, I've been trying to come up with a way to phrase the end result
that doesn't make my brain hurt, but I've mostly failed. The details
below are the closest I've come to something that makes sense to me.

With equality, the concepts of hashing and value have a clear
relationship: x == y implies hash(x) == hash(y), but there's no
implication going in the other direction. Even if the hashes are the
same, the values may be different (you can have hash(x) == hash(y)
without having x == y).

NaN's aside, you also have the relationship that x is y implies x == y
(and the standard containers assume this). Again, there's no
implication in the other direction. Two objects may be equal, while
having different identities (such as 0 == 0.0 == 0j)

The definition of object identity is that x is y implies id(x) ==
id(y) *and* vice-versa.

The suggested change would actually involving defining a *new*
language concept, a "reference id", where ref_id(x) == ref_id(y)
implies x is y, but the reverse is not true. Thus, this is actually a
suggestion for two changes rolled into one:

1. Add the "reference id" concept
2. Change the id() builtin to return the reference id rather than the object id

I think the array.array solution is a more tolerable one: provide
explicit value based containers that are known not to be identity
preserving. If you want maximum speed, you have to be prepared to deal
with the difference in semantics.

Cheers,
Nick.

--
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia