[pypy-dev] x is y <=> id(x)==id(y)

Sun May 5 11:59:44 CEST 2013

Hi all,

I'm just wondering again about some "bug" reports that are not bugs,
about people misusing "is" to compare two immutable objects.  The
current situation in PyPy is that "is" works like "==" for ints,
longs, floats or complexes.  It does not for strs or unicodes or
tuples.  Now of course someone on python-dev was (indirectly)
complaining that you can compare in CPython ``x is ' '``, which works
because single-character strings are cached, but not in PyPy.  I'm
sure someone else has been bitten by writing in CPython ``x is ()``,
which is also cached there.

(Fwiw I think that there is a design flaw somewhere in Python, to
allow "1 is 1" to be executed without any error but also without any
well-defined result...)

Can we fix it once and for all?  It's annoying because of id: if we
want ``x is y`` for equal huge strings x and y, but still want
``id(x)==id(y)``, then we have to compute ``id(some_string)`` in a
rather slow way, producing a huge number.  The same for tuples: if we
always want ``(1, 2) is (1, 2)`` then we need to compute
``id(some_tuple)`` recursively, which can also lead to huge numbers.
In fact such a definition can explode the memory: ``a = (); for i in
range(100): a = (a, a); id(a)`` would likely need a 2**100-digits
number.

Solution 2 would be to add these hacks specially for cases that
CPython caches: I think by now we're only missing empty or single-char
strings or unicodes, and empty tuple.

Solution 3 would be to drop half of the rule, keeping only
``id(x)==id(y) => x is y``.  This would be the easiest, as we could
remove the complicated computations already done for longs or floats
or complexes.  We'd clearly document it as a difference from CPython.
The question is what kind of code might break if we drop the case ``x
is y => id(x)==id(y)``.

A bientôt,

Armin.