[pypy-dev] STM

Thu Jan 5 00:30:34 CET 2012

Hi all,

(Andrew: I'll answer your mail too, but this is independent)

Sorry if this mail is more general than just pypy-dev, but I'll post
it here at least at first.  When thinking more about STM (and also
after reading a paper that cfbolz pointed me to), I would like to
argue that the currently accepted way to add STM to languages is
slightly bogus.

So far, the approach is to take an existing language, like Python or
C, and add a keyword like "transaction" that delimits a nested scope.
You end up with syntax like this (example in Python):

def foo():
    before()
    with transaction:
        during1()
        during2()
    after()

In this example, "during1(); during2()" is the content of the
transaction.  But the issue with this approach is that there is no way
to structure the transactions differently.  What if I want a
transaction that starts somewhere, and ends at some unrelated place?

This is of course related to the way "we" (I, but others before me) described
how transactions could be applied to the CPython interpreter.  When
you think about the C source of CPython, there is a precise point
at which the previous transaction should stop and the next one
immediately start (namely, between bytecodes).
But there is no way to express this using the syntax above.  What
would be needed is something like (example in C):

void mainloop() {
   while (1) {
      dispatch_next_bytecode();
      call_stuff_between_bytecodes();
   }
}
void call_stuff_between_bytecodes() {
   ...;
   /* here, we would like the previous transaction to stop and the
next one to start */
   ...;
}

The above issue occurs when trying to apply STM to CPython, but I
believe the issue to be much more general.  The solution proposed with the
"transaction" keyword assumes a generally-non-STM program into which
we carefully insert STM here and there.  This is reasonable from the
point of view of performance, because running a STM transaction is
costly.  But if we ignore this point of performance (for example, by
assuming we have a hybrid HTM-STM-based approach that was carefully
tweaked for years, with the GC and the JIT that are aware of it), then
this argument no longer applies.  The solution that we really need for
CPython requires a different point of view: a generally-STM program,
in which we carefully add here and there a "yield", i.e. a point at
which it's ok to end the current transaction and start the next one.

While this is obvious in the case of the CPython interpreter, I'd
argue that it is a useful point of view in general.  The paper
mentioned above says that the current approach to multithreading is
completely non-deterministic, with a large collection of tools to help
us "patch" it here and there to make it more deterministic --- i.e.
multithreaded programs behave randomly, and you need to carefully use
locks, semaphores, and so on, in order to get back something that
behaves as you want, hopefully in all cases.  That approach is,
according to this paper, the reason for which programming with
multiple threads is so hard (and I tend to agree).  The paper argues
in favour of a would-be system that would be the other way around:
deterministic with explicit non-determinism added.

So, taking this point of view and making it more concrete, I'd like to
argue that what we really need is not a nested
"transaction" keyword, in C or in Python or in any language.  That
would be just another way to add some amount of determinism to a
inherently non-deterministic approach.  Instead, we need by default
all threads to run always in a coarse-grained transaction; and using a
"yield" keyword or function call, we would as needed make the
transactions more fine-grained.  (I ignore here the issue of I/O.)  In
other words, this would be a step back to the old world of cooperative
multithreading: each thread needs to call "yield" regularly, otherwise
it never allows other threads to run.  The difference with the old
world of cooperative multithreading is that nowadays we have
STM --- "never allows other threads to run" has become a figure of
speach, a way to reason about programs *as if* they were really
switching between threads only at the points where we call "yield".
What *really* occurs is that each thread runs a transaction between
two yield points, so each thread can actually run concurrently with
the others, and the only issue is that if the threads actually do
conflict, then the performance degrades as some threads need to repeat
their action.  This gives a model where we can write a multithreaded
application by starting with almost no "yield" points, and then, *if
performance requires it*, we can carefully add extra yield
points to reduce conflicts.  In other words, it is a model in which we
start writing multithreaded applications using just plain old (rather
deterministic) cooperative multithreading, and then, as needs dictate,
we insert some extra points of non-determinism in order to boost
performance.

Following that reasoning, I no longer want to go for a PyPy version exposing a
"with transaction" context manager.  Instead, it would rather be
something like a function to call at any point to "yield" control ---
or, in the idea that it's better to avoid having yields at random
unexpected points, maybe an alternative solution that looks more
promising: to start a new thread, the programmer calls a function
"start_transactional_thread(g)" where g is a generator; when g yields,
it marks the end of a transaction and the start of the next one.  (The
exact yielded value should be None.)  The difference with the previous
solution is that you cannot yield from a random function; your
function has to have some structure, namely be a generator, in order
to use this yield.

Also, it turns out that this model is easy to write a non-STM
implementation for, for debugging and testing:

_running = []
def start_transactional_thread(g):
    _running.append(g)

def run():
    "Call this in the main thread to run and wait for all started sub-threads"
    while _running:
        i = random.choice(len(_running))
        g = _running.pop(i)
        try:
            g.next()
            _running.append(g)
        except StopIteration:
            pass

(Interestingly, this run() function is similar to stackless.run().
The exact relationships between stackless and this model are yet to
investigate, but it seems that one point of view on what I'm proposing
might be essentially "use stackless, remove some determinism from the
already-unclear tasklet switching order, and add STM under the hood".)

A bientôt,

Armin.