[pypy-dev] Syntax for the 'transaction' module

Tue May 1 11:27:56 CEST 2012

Hi all,

Following the blog post about STM, I would like to sollicitate your
attention to come up with some better syntax for the 'transaction'
module.

* First question: 'transaction'.  The name is kind of bogus, because
it implies that it must be based on transactional memory.  Such a name
doesn't make sense if, say, you are running the single-core emulator
version.  What the module is about is to give a way to schedule tasks
and run them in some unspecified order.

* How about replacing the global functions 'transaction.add()' and
'transaction.run()' with a class, like 'transaction.Runner', that you
need to instantiate and on which you call the methods add() and run().
 If moreover the class has an '__exit__()' that redirects to run(),
then you can use it like this:

    with transaction.Runner() as t:
        for block in blocks:
            t.add(do_stuff, block)

And maybe, like concurrent.futures, a map() method --- although it
seems to me like CPython was happily relegating the built-in map() in
the corner of "advanced users only"; then adding a map() again seems a
bit counter-productive --- it would look like that:

    with transaction.Runner() as t:
        t.map(do_stuff, blocks)

* Note that the added transactions are only run on __exit__(), not
when you call add() or map().  This might be another argument against
map().

* The point of the examples above is that "t" can also be passed
around in the transactions, and t.add() called on it again from there.
 Also, such a syntax nicely removes the need for any global state, and
so it opens the door to nesting: you can write one of the above
examples inside code that happens to run itself as a transaction from
some unrelated outer Runner().  Right now, you cannot call
transaction.run() from within a transaction --- and it doesn't make
sense because we cannot tell if the transaction.add() that you just
did were meant to schedule transactions for the outer or the future
inner run().  That's what is better with this proposed new API.
(Supporting it requires more work in the current implementation,
though.)

* Another issue.  I think by now that we need special support to mean:
"I want to end a transaction, then non-transactionally call this C
function that will likely block for some time, and when it returns, I
want to start the next transaction".  This seem general enough to
support various kinds of things, like calling select() or
epoll_wait().  The question is what kind of support we want.

I played with various ideas and I'll present the combination that
satisfies me the most, but I'm open to any other suggestion.

We could in theory support calling in-line the function, i.e. just
call select() and it will break the current transaction in two.  This
is similar to the fact that select() in CPython releases and
re-acquires the GIL.  But it breaks the abstraction that every add()
gives *one* transaction.  It kind of goes against the general design
of the API so far, which is that you add() things to do, but don't do
them right now --- they will be done later.  To voice it differently,
I dislike this solution because you can break a working program just
by adding a debugging "print" in the middle (assuming that "print"
would also break the current transaction in two, like it releases the
GIL in CPython).  It would break the program because what used to be
in the same transaction, no longer is: random things (done by other
unrelated transactions) can suddenly have happened because you added a
"print".

The idea I'm playing with is two running modes: "breakable" vs
"non-breakable".  Say you have a "@breakable" decorator that you have
to add explicitly on some of your Python functions.  The transaction
is breakable only when all functions in the call stack are @breakable.
 As soon as one non-breakable function is in the call stack, then the
transaction is not breakable (to err on the side of safety).  No clue
if this would make any sense to the user, though.  In the end a call
to select() would either break the transaction in two (if the current
mode is "breakable"), or, like now, in non-breakable mode it would
turn the transaction inevitable (which is bad if the C call is
blocking, because it blocks all other transactions too, but which is
at least correct).

Thanks for reading all my ranting.  Ideas welcome...

A bientôt,

Armin.