[pypy-dev] x is y <=> id(x)==id(y)

Sun May 5 21:16:24 CEST 2013

On 06/05/13 03:35, Maciej Fijalkowski wrote:
> On Sun, May 5, 2013 at 1:20 PM, Steven D'Aprano <steve at pearwood.info> wrote:
>> On 05/05/13 19:59, Armin Rigo wrote:
>>>
>>> Hi all,
>>>
>>> I'm just wondering again about some "bug" reports that are not bugs,
>>> about people misusing "is" to compare two immutable objects.  The
>>> current situation in PyPy is that "is" works like "==" for ints,
>>> longs, floats or complexes.  It does not for strs or unicodes or
>>> tuples.
>>
>>
>> I don't understand why immutability comes into this. The `is` operator is
>> supposed to test whether the two operands are the same object, nothing more,
>> nothing less. Immutable, or mutable, it makes no difference.
>>
>> Now, it may be that *some* immutable objects may (implicitly, or explicitly)
>> promise that you will never have two objects with the same value. For
>> example, float might cache every object created, so that once you have
>> created a float 23.45910234718, it will *always* be reused whenever a float
>> with that value is needed. That would be allowed.
>>
>> But if float does not cache the value, and so you have two different float
>> objects, with different IDs, then it is absolutely wrong for PyPy to treat
>> `is` as == instead of testing object identity.
>>
>> Have I misunderstood what you are saying?
>
> Immutability is important because you can't cache immutable objects.

Yes, I know that :-) but that has nothing to do with the behaviour of `is`.

> It's true what you're saying, but we consistently see bug reports
> about people comparing ints or strings with is and complaining that
> they work fine on cpython, but not on pypy.

Then their code is buggy, not PyPy. But you know that :-)

I don't believe that PyPy should take extraordinary effort to protect people from the consequences of writing buggy code.

But putting that aside, I would expect that:

x is y <=> id(x) == id(y)

The docs say:

"The operators is and is not test for object identity: x is y is true if and only if x and y are the same object. x is not y yields the inverse truth value."

http://docs.python.org/2/reference/expressions.html#index-68

and

"id(object)
Return the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value."

http://docs.python.org/2/library/functions.html#id

So each object has a single, unique, constant ID during its lifetime. So if id(x) == id(y) and x and y overlap in their lifetime, that implies that x and y are the same object. Likewise, if x and y are the same object, that implies that they have the same ID.

> Also, you expect to have
> the same identity if you store stuff in the list and then read out of
> it - which is impossible if you don't actually have any objects in the
> list, just store unwrapped ones.

Ah, now that is an interesting question! My lack of experience with PyPy is going to show now. I take it that PyPy might optimize away the objects inside a list, storing only unboxed values?

This is a really hard question. If I do this:

a = b = X   # regardless of what X is
mylist = [a, None]
assert mylist[0] is a
assert mylist[0] is b

both assertions must pass, no matter what X is, whether mutable or immutable.

But if the values in mylist get unwrapped, then you would have to reconstruct the object identities, and I imagine that this would be painful. But it would be a shame to give up the opportunity for optimizations that unboxing could give.

Have I understood the nature of your problem correctly?

-- 
Steven