what do you think of liburcu? lttng.org/urcu p.s. if/when I get around to profiling pypy, which is something I want to do, I'll be sure to share the results here. d. On 27 December 2010 10:57, Paolo Giarrusso <p.giarrusso@gmail.com> wrote:
On Mon, Dec 27, 2010 at 07:00, Dima Tisnek <dimaqq@gmail.com> wrote:
(dan) random binary trees: O(log2(n)) is 7.2 for builtins, though the constant factor for locks, etc, might make it worthwhile
"7.2"? What is that? Do you mean 7.2=log2(no. of builtins)?
(paolo) non blocking hash maps: memory barriers can be quite costly too
Well, you just _cannot_ avoid them, whatever is your fancy data structure. You can only move most of the cost (or all of it) to write operations - which is helpful if reads are most common, a case which you discuss in your mail.
== Why memory barriers are required (in more detail) == If you use persistent data structures, also called copy-on-write or purely functional (and that's the most obvious way to use a tree in a parallel setting), the whole synchronization problem is reduced to exchanging a pointer (to the root of the tree) between a writer and a reader thread. Let's consider this in the Java memory model (the state of the art). In this model the writer thread has to use an expensive StoreLoad barrier, i.e. mfence on x86 [1], costing more or less as much as a cache miss, while the reader needs just a cheap Load* barrier (for free on x86, cheap on other processor). If you don't use a volatile field, there is very little guarantee about what your program means, and it is almost always buggy; the equivalent in the new C++ memory model (which is otherwise very similar) gives completely undefined semantics (and even crashes are possible in practice, for instance if the reader thread sees a not-fully-initialized object because of the race and invokes a virtual method on it).
[1] http://g.oswego.edu/dl/jmm/cookbook.html
== Other stuff ==
different options will have to be implemented and tested when we get there, speaking on which, is there a test dict load? does anyone have some profiling data, what portion of runtime is spent reading and writing attributes and globals anyways?
I'd be interested as well in such data.
Anyhow, back to parallel interpretation, Is there an issue or page that tracks what's needed for parallel interpretation? so far mentioned: gc, dict, c api
Btw, I think that jit is more important at the moment, but time comes when jit juice has been mostly squeezed out ;-)
-- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/