[Python-Dev] Fighting the theoretical randomness of "is" on immutables

Gregory P. Smith greg at krypto.org
Tue May 7 06:28:52 CEST 2013

On Mon, May 6, 2013 at 7:50 PM, Terry Jan Reedy <tjreedy at udel.edu> wrote:

> On 5/6/2013 6:34 PM, Antoine Pitrou wrote:
>> On Mon, 06 May 2013 18:23:02 -0400
>> Terry Jan Reedy <tjreedy at udel.edu> wrote:
>>> 'Item' is necessarily left vague for mutable sequences as bytearrays
>>> also store values. The fact that Antoine's example 'works' for
>>> bytearrays is an artifact of the caching, not a language-mandated
>>> necessity.
>> No, it isn't.
> Yes it is. Look again at the array example.
> >>> from array import array
> >>> x = 1001
> >>> myray = array('i', [x])
> >>> myray[0] is x
> False
> Change 1001 to a cached int value such as 98 and the result is True
> instead of False. For the equivalent bytearray example
> >>> b = bytearray()
> >>> b.append(98)
> >>> b[0] is 98
> True
> the result is always True *because*, and only because, all byte value are
> (now) cached. I believe the test for that is marked as CPython-specific.
> > You are mixing up values and references.
> No I am not. My whole post was about being careful to not to confuse the
> two. I noted, however, that the Python *docs* use 'item' to mean either or
> both. If you do not like the *doc* being unclear, clarify it.
>  A bytearray or a array.array may indeed store values, but a list stores
>> references to
>> objects.
> I said exactly that in reference to CPython. As far as I know, the same is
> true of lists in every other implementation up until Pypy decided to
> optimize that away. What I also said is that I cannot read the *current*
> doc as guaranteeing that characterization. The reason is that the members
> of sequences, mutable sequences, and lists are all described as 'items'. In
> the first two cases, 'item' means 'value or object reference'. I see
> nothing in the doc to force a reader to change or particularized the
> meaning of 'item' in the third case. If I missed something *in the
> specification*, please point it out to me.
>  I'm pretty sure that not respecting identity of objects stored in
>> general-purpose containers would break a *lot* of code out there.
> Me too. Hence I suggested that if lists, etc, are intended to respect
> identity, with 'is' as currently defined, in any implementation, then the
> docs should say so and end the discussion. I would be happy to commit an
> approved patch, but I am not in a position to decide the substantive
> content. Hence, I tried to provide a neutral analysis that avoided
> confusing the CPython implementation with the Python specification.
> In my final paragraph, however, I did suggest that Pypy respect precedent,
> to avoid breaking existing code and expectations, and call their mutable
> sequences something other than 'list'.

Wouldn't the entire point of such things existing in pypy be that the
implementation is irrelevant to the user and used behind the scenes
automatically in the common case when a container is determined to fit the
special constraint?

I personally do not think we should guarantee that "mylist[0] = x; assert x
is mylist[0]" succeeds when x is an immutable type other than None.  If
something is immutable and not intended to be a singleton and does not
define equality (like None or sentinel values commonly tested using is such
as arbitrary object() instances) it needs to be up to the language VM to
determine when to copy or not in most situations.

You already gave the example of the interned small integers in CPython.
 String constants and names used in code are also interned in today's
CPython implementation.  This doesn't tend to trip any real code up.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20130506/235be4e9/attachment.html>

More information about the Python-Dev mailing list