[Python-Dev] Fighting the theoretical randomness of "is" on immutables

Mon May 6 14:43:38 CEST 2013

On 5/6/2013 4:46 AM, Armin Rigo wrote:

'is' *is* well-defined. In production code, the main use of 'is' is for 
builtin singletons, the bool doubleton, and object instances used as 
sentinals. The most common use, in particular, is 'if a is None:'. For 
such code, the result must be independent of implementation.

For other immutable classes, for which 'is' is mostly irrelevant and 
useless, the result of some code is intentionally implementation 
dependent to allow optional optimizations. 'Implementation dependent' is 
differnt from 'random'. For such classes (int, tuple, set, string), the 
main use of 'is' is to test if the intended optimization is being done. 
In other words, for these classes, the implementation dependence is a 
feature.

The general advice given to newbies by python-list regulars is to limit 
the use of 'is' with immutables to the first group of classes and never 
use it for the second.

> In the context PyPy, we've recently seen again the issue of "x is y"
> not being well-defined on immutable constants.

Since immutable objects have a constant value by definition of 
immutable, I am not sure if you are trying to say anything more by 
adding the extra word.

>  I've tried to
> summarize the issues and possible solutions in a mail to pypy-dev [1]
> and got some answers already.  Having been convinced that the core is
> a language design issue, I'm asking for help from people on this list.
>   (Feel free to cross-post.)
>
> [1] http://mail.python.org/pipermail/pypy-dev/2013-May/011299.html
>
> To summarize: the issue is a combination of various optimizations that
> work great otherwise.  For example we can store integers directly in
> lists of integers, so when we read them back, we need to put them into
> fresh W_IntObjects (equivalent of PyIntObject).

Interesting. I presume you only do this when the ints all fit in a 
machine int so that all require the same number of bytes so you can 
efficiently index and slice.

This is sort of what strings do with characters, except for there being 
no char class. The similarity is that if you concatenate a string to 
another string and then slice it back out, you generally get a different 
object, but may get the same object if some optimization has that 
effect. For instance, in current CPython, s is ''+s is s+''. The details 
depend on the CPython version.

> We solved temporarily the issue  of "I'm getting an object which isn't
 > ``is``-identical to the one I put in!"

Does the definition of list operations guarantee preservation of object 
identify? After 'somelist.append(a)', must 'somelist.pop() is a' be 
true? I am not sure. For immutables, it could be an issue if someone 
stores the id. But I don't know why someone would do that for an int.

As I already said, we routinely tell people on python-list (c.l.p) that 
they shouldn't care about ids of ints.. The identity of an int cannot 
(and should not) affect the result of numerical calculation.

 > by making all equal integers ``is``-identical.

Which changes the definition of 'is', or rather, makes the definition 
implementation dependent.

> This required hacking at ``id(x)`` as well to keep the requirement ``x
> is y <=> id(x)==id(y)``.  This is getting annoying for strings, though
> -- how do you compute the id() of a long string?  Give a unique long
> integer?  And if we do the same for tuples, what about their id()?

The solution to the annoyance is to not do this ;-). More seriously, are 
you planning to unbox strings or tuples?

> The long-term solution that seems the most stable to me would be to
> relax the requirement ``x is y <=> id(x)==id(y)``.

I see this as a definition, not a requirement. Changing the definition 
would break any use that depends on the definition being what it is.

--
Terry Jan Reedy