[pypy-dev] 1.4.1 threading

William ML Leslie william.leslie.ttg at gmail.com
Mon Dec 27 07:30:35 CET 2010


On 27 December 2010 17:00, Dima Tisnek <dimaqq at gmail.com> wrote:
> while we are on the subject, is there a plan to provide different
> levels of sparseness for different levels of name lookup?
> for example globals vs builtins, first needs to be quite sparce so
> that builtins show through well, second hardly, because there's
> nowhere else to look if builtins don't have the name.
> then the tradeoff could be more dynamic, that is frequently accessed
> dicts could be e.g. more sparse and rarely accessed more compact?
> (obviousely more sparse is not always better, e.g. in terms of cpu cache)
> of course "frequently accessed" is not as easy as frequently ran code,
> e.g. here "class A: ...; while 1: A().func()", func lookup occurs in
> different objects, yet it is same logical operation.
>
> come to think of it, is there any point in polymorphic dicts, e.g.
> attribute access could be imeplemented as a near-perfect compact hash
> map if attribute names change rarely, while regular dict()s are
> expected to change keys often.

No, not really, but pypy already heavily optimises this case - see
mapdicts, celldicts, and sharing dicts.  Celldict is commonly used for
module dictionaries, and emulates a direct pointer to the value, which
can be cached along side the LOAD_GLOBAL instruction.  I implemented a
similar system (pre-computed array based on known module-global names)
a few years back in pypy (pre-jit) and got a 6% speedup over regular
dicts, but celldicts get a similar speedup and are more generally
applicable.

As for perfect hashing: our existing mechanism for hashing beats it,
hands down.  In cpython at least, I haven't checked the pypy source on
this topic, the hash of a string is cached on the string object
itself, which means in the case of identifiers no hash is ever
computed on global lookup.  The only thing that could really be faster
is something like a slot on the symbol itself.

Celldicts move the synchronisation point out of the hash table and
into the entry for common cases, which changes the synchronisation
question significantly.

> Btw, I think that jit is more important at the moment, but time comes
> when jit juice has been mostly squeezed out ;-)

There are occasional memory-model discussions, but at the end of the
day what will probably happen is the people who step up to do the work
to implement it will probably also get to do most of the design work.

-- 
William Leslie



More information about the Pypy-dev mailing list