[pypy-dev] 1.4.1 threading
p.giarrusso at gmail.com
Mon Dec 27 20:50:08 CET 2010
On Mon, Dec 27, 2010 at 19:54, Dima Tisnek <dimaqq at gmail.com> wrote:
> what do you think of liburcu? lttng.org/urcu
I've studied RCU in the Linux kernel (from which URCU derives) and
thought for a long time of using it for this problem. In short, once
you have GC, RCU (as in the Linux kernel) becomes (almost) trivial,
because RCU is almost entirely about how to delay freeing objects.
The RCU-GC connection is also mentioned by Documentation/RCU/RTFP.txt
in the Linux sources.
So, you need to just read about persistent data structures, and
remember that persistence is not the point - data structures which are
just almost persistent can also be OK, if you get them right. If you
replace a value in a dictionary, for instance, you don't necessarily
need to keep the old version around, as long as you the value fits in
a memory word so that it can be atomically replaced.
Speaking of libraries, you might want to look at this for the atomic
primitives you'll need:
I'm more used to the primitives of the Linux kernel, I remember I
didn't fully like that library, but IIRC it was a matter of taste -
the library is from Hans Boehm, which makes it trustworthy.
> p.s. if/when I get around to profiling pypy, which is something I want
> to do, I'll be sure to share the results here.
> On 27 December 2010 10:57, Paolo Giarrusso <p.giarrusso at gmail.com> wrote:
>> On Mon, Dec 27, 2010 at 07:00, Dima Tisnek <dimaqq at gmail.com> wrote:
>>> (dan) random binary trees: O(log2(n)) is 7.2 for builtins, though the
>>> constant factor for locks, etc, might make it worthwhile
>> "7.2"? What is that? Do you mean 7.2=log2(no. of builtins)?
>>> (paolo) non blocking hash maps: memory barriers can be quite costly too
>> Well, you just _cannot_ avoid them, whatever is your fancy data structure.
>> You can only move most of the cost (or all of it) to write operations
>> - which is helpful if reads are most common, a case which you discuss
>> in your mail.
>> == Why memory barriers are required (in more detail) ==
>> If you use persistent data structures, also called copy-on-write or
>> purely functional (and that's the most obvious way to use a tree in a
>> parallel setting), the whole synchronization problem is reduced to
>> exchanging a pointer (to the root of the tree) between a writer and a
>> reader thread.
>> Let's consider this in the Java memory model (the state of the art).
>> In this model the writer thread has to use an expensive StoreLoad
>> barrier, i.e. mfence on x86 , costing more or less as much as a
>> cache miss, while the reader needs just a cheap Load* barrier (for
>> free on x86, cheap on other processor).
>> If you don't use a volatile field, there is very little guarantee
>> about what your program means, and it is almost always buggy; the
>> equivalent in the new C++ memory model (which is otherwise very
>> similar) gives completely undefined semantics (and even crashes are
>> possible in practice, for instance if the reader thread sees a
>> not-fully-initialized object because of the race and invokes a virtual
>> method on it).
>>  http://g.oswego.edu/dl/jmm/cookbook.html
>> == Other stuff ==
>>> different options will have to be implemented and tested when we get there,
>>> speaking on which, is there a test dict load?
>>> does anyone have some profiling data, what portion of runtime is spent
>>> reading and writing attributes and globals anyways?
>> I'd be interested as well in such data.
>>> Anyhow, back to parallel interpretation, Is there an issue or page
>>> that tracks what's needed for parallel interpretation?
>>> so far mentioned: gc, dict, c api
>>> Btw, I think that jit is more important at the moment, but time comes
>>> when jit juice has been mostly squeezed out ;-)
>> Paolo Giarrusso - Ph.D. Student
Paolo Giarrusso - Ph.D. Student
More information about the Pypy-dev