[pypy-dev] Syntax for the 'transaction' module

Wed May 2 05:51:48 CEST 2012

On 05/01/2012 08:35 AM Armin Rigo wrote:
> Hi Holger,
>
> On Tue, May 1, 2012 at 16:48, holger krekel<holger at merlinux.eu>  wrote:
>> Maybe "atomic" could become a __pypy__ builtin and there could be a "ame" or
>> so package which atomic-using apps could depend on? In any case,
>> I really like the twist of "To remedy the GIL use AME" :)
>
> Yes, indeed, a private name in the __pypy__ module looks fine.  The
> applications are supposed to use the "ame" module or package (or
> whatever name makes sense, but I'm getting convinced that
> "transaction" is not a good one).  The "ame" module re-exports
> __pypy__._atomic as ame.atomic for general use, but also offers more
> stuff like the Runner class with add()/run() methods.
>
> Also, again, it doesn't necessarily make sense to force a lexically
> nested usage of ame.atomic, so we could replace __pypy__._atomic with
> two primitives __pypy__._atomic_start and _atomic_stop, re-exported in
> the "ame" module, and write the "ame.atomic" context manager in pure
> Python.
>
>> I am wondering how this all applies to the execnet-execution model, btw.
>> (http://codespeak.net/execnet for those who wonder what i mean)
>> remote_exec()s on the same gateway run currently in different threads
>> and thus only send/receive needs to use "with atomic", right?
>
> In my proposal, existing applications run fine, using multiple cores
> if they are based on multiple threads.  You use "with atomic" to have
> an additional degree of synchronization when you don't want to worry
> about locks&  friends (which should be *always*, but is still an
> optional benefit in this model).  Maybe you're just talking about
> simplifying the implementation of execnet's channels to use "with
> atomic" instead of acquiring and releasing locks.  Then yes, could be,
> as long as you remember that "with atomic" gives neither more nor less
> than its naive implementation: "don't release the GIL there".
>

I am looking at
_____________________________________________________________________

def add(f, *args, **kwds):
     """Register the call 'f(*args, **kwds)' as running a new
     transaction.  If we are currently running in a transaction too, the
     new transaction will only start after the end of the current
     transaction.  Note that if the same or another transaction raises an
     exception in the meantime, all pending transactions are cancelled.
     """
     r = random.random()
     assert r not in _pending    # very bad luck if it is
     _pending[r] = (f, args, kwds)
_____________________________________________________________________

from https://bitbucket.org/pypy/pypy/raw/stm-gc/lib_pypy/transaction.py

and wondering about implementing atomicity guarantees in the evaluation of
*args and **kwds, and maybe even f -- i.e., it would seem that there is
opportunity effectively to pass arguments by value (once compiled),
by reference, or by name, and arguments can be complex composites
or simple constants. To prepare them they will be evaluated according
to their expressions, either at call time or maybe partly at def time
or construction time or method binding time or combinations.
(When/how might multiple processors cooperate to evaluate arguments for
passing into the transaction context? Never?)

So how does one think about the state of arguments/values being
accessed by f when it runs in its transaction context?

I.e., if some arguments need to be version-synced, are there new ways
to program that? How would you put the evaluation of function arguments
into the inside of a transaction, e.g. if the arguments derive from stateful
stuff that is updated as a side effect? Wrap it with an outer transaction?

 From the definition of transaction.run it appears that f must have
its effect as a global side effect identified either through its
arguments or built into its code (and presumably one could pass a bound
method in place of f for an atomic update of instance attributes?).

Seems like it might be nice to be able to pass a function and get back
a list of function results? What should the common convention for accumulating
results be with run as now? Should one pass f an additional queue argument to append to?
Or an index into a selected slot in a global list, if ordering is predetermined.

BTW,is side-by-side parallelism the only concern in the current attempt to run programs
on many cores? What about pipelining-type parallelism, like nested generators with
inner loops feeding outer ones running on different processors?

I've played with the idea of generators in the form of classes that can be glued with '|'
so they become logically one generator like (I've got a toy that will do this):
     for v in G(foo, seq)|G(bar)|G(baz): print v # pipe sequence-items through foo then bar then baz
having the effect of
     for v in (baz(z) for z in (bar(r) for r in (foo(o) for o in seq))): print v
or, I suppose,
     for o in seq: print baz(bar(foo(o)))
but that's factored differently.

Wondering how transaction stuff might play re transaction guarantees for generator states
and serializing the feed from one to the next. Does it even make sense or does pipelining need
a different kind of support?

Re names: what about a mnemonic like pasifras == [p]arallel [as] [if] [ra]ndomly [s]erial
Thence pasifras.add ?

Regards,
Bengt Richter