[Python-Dev] Fighting the theoretical randomness of "is" on immutables
Terry Jan Reedy
tjreedy at udel.edu
Mon May 6 14:43:38 CEST 2013
On 5/6/2013 4:46 AM, Armin Rigo wrote:
'is' *is* well-defined. In production code, the main use of 'is' is for
builtin singletons, the bool doubleton, and object instances used as
sentinals. The most common use, in particular, is 'if a is None:'. For
such code, the result must be independent of implementation.
For other immutable classes, for which 'is' is mostly irrelevant and
useless, the result of some code is intentionally implementation
dependent to allow optional optimizations. 'Implementation dependent' is
differnt from 'random'. For such classes (int, tuple, set, string), the
main use of 'is' is to test if the intended optimization is being done.
In other words, for these classes, the implementation dependence is a
feature.
The general advice given to newbies by python-list regulars is to limit
the use of 'is' with immutables to the first group of classes and never
use it for the second.
> In the context PyPy, we've recently seen again the issue of "x is y"
> not being well-defined on immutable constants.
Since immutable objects have a constant value by definition of
immutable, I am not sure if you are trying to say anything more by
adding the extra word.
> I've tried to
> summarize the issues and possible solutions in a mail to pypy-dev [1]
> and got some answers already. Having been convinced that the core is
> a language design issue, I'm asking for help from people on this list.
> (Feel free to cross-post.)
>
> [1] http://mail.python.org/pipermail/pypy-dev/2013-May/011299.html
>
> To summarize: the issue is a combination of various optimizations that
> work great otherwise. For example we can store integers directly in
> lists of integers, so when we read them back, we need to put them into
> fresh W_IntObjects (equivalent of PyIntObject).
Interesting. I presume you only do this when the ints all fit in a
machine int so that all require the same number of bytes so you can
efficiently index and slice.
This is sort of what strings do with characters, except for there being
no char class. The similarity is that if you concatenate a string to
another string and then slice it back out, you generally get a different
object, but may get the same object if some optimization has that
effect. For instance, in current CPython, s is ''+s is s+''. The details
depend on the CPython version.
> We solved temporarily the issue of "I'm getting an object which isn't
> ``is``-identical to the one I put in!"
Does the definition of list operations guarantee preservation of object
identify? After 'somelist.append(a)', must 'somelist.pop() is a' be
true? I am not sure. For immutables, it could be an issue if someone
stores the id. But I don't know why someone would do that for an int.
As I already said, we routinely tell people on python-list (c.l.p) that
they shouldn't care about ids of ints.. The identity of an int cannot
(and should not) affect the result of numerical calculation.
> by making all equal integers ``is``-identical.
Which changes the definition of 'is', or rather, makes the definition
implementation dependent.
> This required hacking at ``id(x)`` as well to keep the requirement ``x
> is y <=> id(x)==id(y)``. This is getting annoying for strings, though
> -- how do you compute the id() of a long string? Give a unique long
> integer? And if we do the same for tuples, what about their id()?
The solution to the annoyance is to not do this ;-). More seriously, are
you planning to unbox strings or tuples?
> The long-term solution that seems the most stable to me would be to
> relax the requirement ``x is y <=> id(x)==id(y)``.
I see this as a definition, not a requirement. Changing the definition
would break any use that depends on the definition being what it is.
--
Terry Jan Reedy
More information about the Python-Dev
mailing list