[pypy-dev] 1.4.1 threading

Thu Dec 30 01:27:02 CET 2010

On 12/27/2010 06:25 PM, Paolo Giarrusso wrote:
>> All these thoughts go into the wrong direction, imo. The JIT removes
>> nearly all dictionary accesses to global dicts, instance and class
>> dicts. Even without the JIT, purely interpreting things, there are
>> caches that bypass most dict lookups.
>
> That's very interesting. However, aren't such caches also hash maps in
> the end (unless you do inline caching in your interpreter, like in
> Ruby 1.9)? I remember reading so in PyPy's docs; moreover, that's a
> standard optimization for method lookup.

Indeed, some of the caches are again hash maps, e.g. the method cache is 
a global hash map. However, since it is a cache of a fixed size, its 
implementation is a lot simpler than that of dictionaries.

Some of the caches (like that of attribute access) are indeed inline 
caches and thus don't need hash maps at all. At some point the method 
cache was an inline cache as well, but that didn't seem to actually help 
much.

In general it seems that eliminating single dict lookups is rarely worth 
it in PyPy. We also had an inline global dict lookup cache at some 
point, but it didn't give an interesting enough effect to keep it. I'm 
talking purely about the interpreter here, of course, the jit gets rid 
of all these lookups anyway.

> Sharing such caches between different interpreter threads would be
> potentially useful, if that implied no expensive synchronization for
> readers - which is possible for instance on x86 (read barriers are for
> free), by using persistent data structures. Writes to such caches
> should (hopefully) be rare enough to make the extra synchronization
> not too expensive, however this needs benchmarking.

Again, this is all idle speculation. If PyPy wants to move towards a 
GIL-less implementation, a *huge* amount of problems would need to be 
solved first before such optimizations become important. We would need 
to fix the GC. We would need to think about the memory model of RPython 
in the presence of multi-CPU threading, and that of Python. At the 
moment I don't see that work happening, because nobody in the current 
core contributor group is qualified or interested.

Yes, in a sense that's not very forward-looking, given the move to 
multi-core. Otoh it's not clear to me that shared-memory multithreading 
is really the approach that makes most sense to PyPy in its current state.

Cheers,

Carl Friedrich