[pypy-dev] STM

Fri Jan 6 06:52:32 CET 2012

On 5 January 2012 10:30, Armin Rigo <arigo at tunes.org> wrote:
> Hi all,
>
> (Andrew: I'll answer your mail too, but this is independent)
>
> Sorry if this mail is more general than just pypy-dev, but I'll post
> it here at least at first.  When thinking more about STM (and also
> after reading a paper that cfbolz pointed me to), I would like to
> argue that the currently accepted way to add STM to languages is
> slightly bogus.
>
> So far, the approach is to take an existing language, like Python or
> C, and add a keyword like "transaction" that delimits a nested scope.
> You end up with syntax like this (example in Python):
>
> def foo():
>    before()
>    with transaction:
>        during1()
>        during2()
>    after()
>
> In this example, "during1(); during2()" is the content of the
> transaction.  But the issue with this approach is that there is no way
> to structure the transactions differently.  What if I want a
> transaction that starts somewhere, and ends at some unrelated place?

This is the way it has been described, and how most common usages will
probably look.  But I don't think there has ever been any suggestion
that dynamic extent is the scope at which transactions *should* be
implemented, any more than context managers are the the
be-all-and-end-all solution for resource management.  It's a
convenience thing.  In the case of open files, for example, "with" has
lower syntactic overhead than the equivalent try/finally; but
file.close() still needs to exist for more advanced usage patterns.

> Following that reasoning, I no longer want to go for a PyPy version exposing a
> "with transaction" context manager.  Instead, it would rather be
> something like a function to call at any point to "yield" control ---
> or, in the idea that it's better to avoid having yields at random
> unexpected points, maybe an alternative solution that looks more
> promising: to start a new thread, the programmer calls a function
> "start_transactional_thread(g)" where g is a generator; when g yields,
> it marks the end of a transaction and the start of the next one.  (The
> exact yielded value should be None.)  The difference with the previous
> solution is that you cannot yield from a random function; your
> function has to have some structure, namely be a generator, in order
> to use this yield.

Here you propose two solutions. I'll consider the generator one first:

The requirement to be a generator is clever, because it enables you to
know that some random function you call won't attempt to commit your
current transaction and start a new one. Yet, this is also slightly
ugly, because it has similar composition-related issues to context
managers - the user is still limited to a single dynamic extent.  It's
not clear to me what this means in the presence of eg. stackless,
either, as you say.  Nevertheless, you can implement this generator
model using a context manager, if you don't care about protecting the
loop header and creation of the context manager (and I don't see why
you would):

def start_transactional_thread(g):
    stm.start_thread(_transactional_thread, g)

def _transactional_thread(g):
    iterator = g()
    while True:
        with stm.transaction:
            try:
                iterator.next()
            except StopIteration:
                return

The other case was a function to call to commit the transaction (and
start a new one?).  I would like to think that you shouldn't be able
to commit a transaction that you don't know about (following
capability discipline), and that concept is easier to represent as a
method of some transaction object rather than a function call.  This
approach is strictly more general than the generator concept and the
one that makes the most sense to me.  It also more easily extends to
distributed transaction management &c.

I suspect that *even if you don't allow nesting of transactions*, this
model will suit you better.  Consider what happens when a JITted loop
(which has its own transaction) makes a residual call.  If you have
the ability to pass around the transaction object you want to talk
about, you can commit the existing transaction and create a new one.
When you return into the loop, you can create a new transaction and
store that somewhere, this transaction becoming the current
transaction for the remainder of the loop.

The reason I bring this up is that even though you implement
transaction handling with its own special llop, you'd never sensibly
model this with a generator.  If you were limited to generators, you'd
not be able to implement this in the llinterp, or the blackhole
interpreter either without sufficient magic.

-- 
William Leslie