[pypy-dev] Object identity and dict strategies

Fri Jul 8 12:04:18 CEST 2011

2011/7/8 Armin Rigo <arigo at tunes.org>

> Hi William,
>
> On Fri, Jul 8, 2011 at 10:31 AM, William ML Leslie
> <william.leslie.ttg at gmail.com> wrote:
> > On another note: what Alex talks about as being two different cases
> > are just one with the small int optimisation - all references can be
> > compared by value in the C backend with small ints enabled, if the
> > object space doesn't provide alternative behaviour.
>
> No, because e.g. of longs.  You don't have the same issue if you use
> longs instead of ints in this particular example, but more generally
> the issue exists too.
>
> A first note is that it's impossible to satisfy all three of Alex's
> criteria in general: we would like id(x) to be a unique word-sized
> number determined only by the value of 'x', and different for every
> value of 'x' and for every object of a different type too; but if the
> possible 'x'es are all possible word-sized integers, then it's
> impossible to satisfy this, just because there are too many of them.
> The problem only gets "more impossible" if we include all long objects
> in 'x'.
>
> The problem is not new, it is just a bit more apparent.  For example,
> already in pypy 1.5 we have:
>
> >>>> a = A()
> >>>> d = a.__dict__
> >>>> s = 'foobar'
> >>>> d[s] = 5
> >>>> id(s)
> 163588812
> >>>> id(d.keys()[0])
> 163609508
> >>>> id(d.keys()[0])
> 163609520
>
> I thought that there are also issues that would only show up with the
> JIT, because of the _immutable_ flag on W_IntObject, but it seems that
> I'm wrong.  I can only say that Psyco has such issues, but nobody
> complained about them:
>
> lst = []
> def f(x):
>    for _ in range(3): pass    # prevents inlining
>    lst.append(x)
> def g(n):
>    for i in range(n):
>        f(i); f(i)
>
> With Psyco a call to g(5000) puts in 'lst' 10000 integer objects that
> are mostly all distinct objects, although there should in theory be at
> most 5000 distinct objects in there.  (PyPy is safe so far because the
> call_assembler from g() to f() passes fully built W_IntObjects,
> instead of just cpu-level words; but that may change if in the future
> we add an optimization that knows how to generate a more efficient
> call_assembler.)
>
> I suppose that it's again a mixture of rules that are too vague and
> complains "but it works on CPython!" that are now being voiced just
> because the already-existing difference just became a bit more
> apparent.  Sorry for the rant, I don't have an obvious solution :-)
>
>
> A bientôt,
>
> Armin.
>

Hi Armin

I fully agree. It's not an issue, but an implementation-specific detail
which programmers don't have to assume always true.

CPython can be compiled without "smallints" (-5..256, if I remember
correctly) caching. There's a #DEFINE that can be disabled, so EVERY int (or
long) will be allocated, so using the is operator will return False most of
the time (unless you are just copied exactly the same object).

The same applies for 1 character strings, which are USUALLY cached by
CPython.

So, there must be care about using is. It's safe for some trivial objects
(None, False, True, Ellipsis) and, I think, with user-defined classes'
instances, but not for everything.

Regards,

Cesare
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20110708/d387cd81/attachment.html>