[pypy-dev] Object identity and dict strategies
exarkun at twistedmatrix.com
exarkun at twistedmatrix.com
Fri Jul 8 17:04:50 CEST 2011
On 02:17 pm, fijall at gmail.com wrote:
>On Fri, Jul 8, 2011 at 4:14 PM, Amaury Forgeot d'Arc
><amauryfa at gmail.com> wrote:
>>2011/7/8 Cesare Di Mauro <cesare.di.mauro at gmail.com>:
>>>I fully agree. It's not an issue, but an implementation-specific
>>>detail
>>>which programmers don't have to assume always true.
>>>
>>>CPython can be compiled without "smallints" (-5..256, if I remember
>>>correctly) caching. There's a #DEFINE that can be disabled, so EVERY
>>>int (or
>>>long) will be allocated, so using the is operator will return False
>>>most of
>>>the time (unless you are just copied exactly the same object).
>>>
>>>The same applies for 1 character strings, which are USUALLY cached by
>>>CPython.
>>
>>But the problem here is not object cache, but preservation of object
>>identity,
>>which is quite different.
>>Python containers are supposed to keep the objects you put inside:
>
>[citation needed] array.array does not for one
Yes, and array.array is weird. :) It either exists as a memory
optimization (ie, I don't want objects) or a way to directly lay out
memory (to pass to a C API). Either way, you can't put arbitrary
objects into it either - so it's already a little special, even if you
disregard the fact that it doesn't preserve the identify the objects you
can put into it.
However, you're right. It exists, and it has this non-identity-
preserving behavior. Is it a good thing, though? Or just an accident
of how someone tried to let CPython be faster for some types of
problems?
>>
>>myList.append(x)
>>assert myList(-1) is x
>>
>>myDict[x] = 1
>>for key in myDict:
>> if key is x:
>> ...
>
>also dict doesn't work if you overwrite the key:
>
>d = {1003: None}
>x = 1003
>d[x] = None
>d.keys()[0] is x
This doesn't invalidate the original point, as far as I can tell. It
just demonstrates again that you can have two instances of 1003.
Whether dict guarantees to always use the new key or the old key when an
update is made is a separate question.
I think it would be better if object identity didn't depend on this
mysterious quality of "immutability". The language is easier to
understand (particularly for new programmers) if one can talk about
objects and references without having to also explain that _some_ data
types are represented using things that are sort of like objects but not
quite (and worse if it depends on what types the JIT feels like playing
with in any particular version of the interpreter).
Jean-Paul
More information about the pypy-dev
mailing list