Re: [pypy-dev] 1.4.1 threading

27 Dec 2010

      what do you think of liburcu? lttng.org/urcu

p.s. if/when I get around to profiling pypy, which is something I want
to do, I'll be sure to share the results here.

d.

On 27 December 2010 10:57, Paolo Giarrusso <p.giarrusso@gmail.com> wrote:
...
On Mon, Dec 27, 2010 at 07:00, Dima Tisnek <dimaqq@gmail.com> wrote:
...
(dan) random binary trees: O(log2(n)) is 7.2 for builtins, though the
constant factor for locks, etc, might make it worthwhile
"7.2"? What is that? Do you mean 7.2=log2(no. of builtins)?
...
(paolo) non blocking hash maps: memory barriers can be quite costly too
Well, you just _cannot_ avoid them, whatever is your fancy data structure.
You can only move most of the cost (or all of it) to write operations
- which is helpful if reads are most common, a case which you discuss
in your mail.
== Why memory barriers are required (in more detail) ==
If you use persistent data structures, also called copy-on-write or
purely functional (and that's the most obvious way to use a tree in a
parallel setting), the whole synchronization problem is reduced to
exchanging a pointer (to the root of the tree) between a writer and a
reader thread.
Let's consider this in the Java memory model (the state of the art).
In this model the writer thread has to use an expensive StoreLoad
barrier, i.e. mfence on x86 [1], costing more or less as much as a
cache miss, while the reader needs just a cheap Load* barrier (for
free on x86, cheap on other processor).
If you don't use a volatile field, there is very little guarantee
about what your program means, and it is almost always buggy; the
equivalent in the new C++ memory model (which is otherwise very
similar) gives completely undefined semantics (and even crashes are
possible in practice, for instance if the reader thread sees a
not-fully-initialized object because of the race and invokes a virtual
method on it).
[1] http://g.oswego.edu/dl/jmm/cookbook.html
== Other stuff ==
...
different options will have to be implemented and tested when we get there,
speaking on which, is there a test dict load?
does anyone have some profiling data, what portion of runtime is spent
reading and writing attributes and globals anyways?
I'd be interested as well in such data.
...
Anyhow, back to parallel interpretation, Is there an issue or page
that tracks what's needed for parallel interpretation?
so far  mentioned: gc, dict, c api
...
Btw, I think that jit is more important at the moment, but time comes
when jit juice has been mostly squeezed out ;-)
--
Paolo Giarrusso - Ph.D. Student
http://www.informatik.uni-marburg.de/~pgiarrusso/