
Armin, Sarah and I are restarting our work on CSP, and extending to creating actors and a dataflow library. It would be good to make this work on Jython, IronPython and PyPy as well as CPython. However we want to get away from a reliance on multiprocessing since it is rather heavyweight for the sort of parallelism we are after. STM as an infrastructure layer in PyPy and CPython would get us away from the GIL and allow for using Python threads bound to kernel threads to allow a single Python process to allow us to create lighweight processes (no application level shared memory). Is the STM variant of PyPy and/or CPython in any form of usable state? If so then we can investigate building from source for ourselves so as to use it as a foundation for building the higher abstraction parallelism for applications programmers. Thanks. -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder@ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel@winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder

Hi Russel, On Sun, Sep 30, 2012 at 11:10 AM, Russel Winder <russel@winder.org.uk> wrote:
However we want to get away from a reliance on multiprocessing since it is rather heavyweight for the sort of parallelism we are after. STM as an infrastructure layer in PyPy and CPython would get us away from the GIL and allow for using Python threads bound to kernel threads to allow a single Python process to allow us to create lighweight processes (no application level shared memory).
I'm not really sure I follow you exactly. You want to have 'multiprocessing' using OS threads and no shared memory, rather than using processes? That looks like it will have the same amount of overhead to me. But anyway, if that's what you want, then I don't understand where STM comes into the picture. STM is a trade-off solution: you get *shared* memory concurrency, possibly with easier models than threads to program with, against some high but fixed overhead. If your starting point is no shared memory, then STM makes little sense. If you just want several independent Python interpreters in the same OS process, then it is probably possible with some small amount of hacking in either CPython or PyPy. A bientôt, Armin.

Hi, the following is a collection of unfinished thoughts. after my thesis i'll be experimenting with a relaxed csp-ish model based on python native generator based continuations as well as the new continulet-jit-3 based greenlets. my basic assumption is that having limited amount of shared memory is acceptable. the csp-ish "processes" are to be modeled as generators with yield expressions or green-lets and will assume a strong data locality. all communication will be started by suspending the execution and "switching away" with some payload that kind of starting/stopping seems to lend itself well to stm transactions basically one iteration of a generator will be one transaction and internal communication will also be separate transactions my current hypothesis is that such a model will lend itself to easy parallel execution since iteration steps of different continuations will be mostly completely independent and can just run in parallel. communication itself will be in transactions with higher conflict potential, but i assume that armin will find ways to evade the conflict issues with in-process communication channels, so i'm not going to think about it more till it turns out to be an actual problem in experimentation My main focus with those experiments will be concurrent systems. -- Ronny On 09/30/2012 11:10 AM, Russel Winder wrote:
Armin,
Sarah and I are restarting our work on CSP, and extending to creating actors and a dataflow library. It would be good to make this work on Jython, IronPython and PyPy as well as CPython. However we want to get away from a reliance on multiprocessing since it is rather heavyweight for the sort of parallelism we are after. STM as an infrastructure layer in PyPy and CPython would get us away from the GIL and allow for using Python threads bound to kernel threads to allow a single Python process to allow us to create lighweight processes (no application level shared memory).
Is the STM variant of PyPy and/or CPython in any form of usable state? If so then we can investigate building from source for ourselves so as to use it as a foundation for building the higher abstraction parallelism for applications programmers.
Thanks.
_______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev

Hi Ronny, On Sun, Sep 30, 2012 at 3:50 PM, Ronny Pfannschmidt <Ronny.Pfannschmidt@gmx.de> wrote:
after my thesis i'll be experimenting with a relaxed csp-ish model based on python native generator based continuations as well as the new continulet-jit-3 based greenlets.
my basic assumption is that having limited amount of shared memory is acceptable.
What you are thinking about is to start from the naturally multicore model of separate address spaces, and add some amount of shared memory. You would use STM to handle the result. It is the opposite of what I'm thinking about, which is to start with a non-multithread, non-tasklet-based program and add multicore capability to it. I would be using STM to "create" multicore capability, whereas you would be using it to "create" shared memory. I am more interested in the first approach than the second because I think it is closer to what untrained programmers start with, but both approaches are potentially valid. Russel: STM is a powerful tool that makes sense of shared memory in multicore situations. I fail to understand why you are looking at it in the absence of shared memory... A bientôt, Armin.

On 09/30/2012 04:22 PM, Armin Rigo wrote:
Hi Ronny,
On Sun, Sep 30, 2012 at 3:50 PM, Ronny Pfannschmidt <Ronny.Pfannschmidt@gmx.de> wrote:
after my thesis i'll be experimenting with a relaxed csp-ish model based on python native generator based continuations as well as the new continulet-jit-3 based greenlets.
my basic assumption is that having limited amount of shared memory is acceptable.
What you are thinking about is to start from the naturally multicore model of separate address spaces, and add some amount of shared memory. You would use STM to handle the result. It is the opposite of what I'm thinking about, which is to start with a non-multithread, non-tasklet-based program and add multicore capability to it. I would be using STM to "create" multicore capability, whereas you would be using it to "create" shared memory. I am more interested in the first approach than the second because I think it is closer to what untrained programmers start with, but both approaches are potentially valid.
i think both approaches are valid, they lend themselves to help and reason about different kinds of problems that make different kinds of serial programs concurrent and parallel. i think for most purposes simple sequential communicating programs are way more easy to reason about than anything else. the transaction module approach already seems to require to chunk up programs in semi-small transactions, that may cause other transactions to be scheduled which seems more and more like twisted's defereds. to my eyes twisted style code is a kind of spaghetti that is very hard to reason about. which is why i want to experiment executing multiple longer sequential programs in chunks that may be interleaved and/or parallel the reason why i start with generators instead of green-lets is simply cause they cannot ever be nested. this will allow more simple reasoning.
Russel: STM is a powerful tool that makes sense of shared memory in multicore situations. I fail to understand why you are looking at it in the absence of shared memory...
Im under the impression the intent is to have non-shared application state while the interpreter states are still shared (i might be using the wrong words here)
A bientôt,
Armin.
-- Ronny

Hi Ronny: ________________________________ From: Ronny Pfannschmidt <Ronny.Pfannschmidt@gmx.de> To: Armin Rigo <arigo@tunes.org> Cc: Sarah Mount <s.mount@wlv.ac.uk>; PyPy_Developers <pypy-dev@python.org> Sent: Sunday, September 30, 2012 10:43 AM Subject: Re: [pypy-dev] PyPy STM
the reason why i start with generators instead of green-lets is simply cause they cannot ever be nested.
I would hold the reverse view since to the best of my knowledge, generators don't lend themselves to composability. I suspect greenlets (or continuelets?) are easier to work with.
i think for most purposes simple sequential communicating programs are way more easy to reason about than anything else.
That is why I use Stackless Python.... And I would love to try to incorporate STM into Stackless. I recently read a book by John Reppy on Concurrent ML. In it, Reppy gives the theoretical reasons about the virtues of synchronous channels with buffers. You may want to look at John Reppy's paper "Parallel Concurrent ML." ( http://people.cs.uchicago.edu/~jhr/papers/2009/icfp-parallel-cml.pdf). Under the hood, Reppy and company use optimistic locking to implement efficient message passing on a multicore system. Another paper you should check out is "Scalable Join Patterns" by Russo and Turon ( I exchanged e-mails with these guys). I'll admit, I don't understand all the ins-and-outs of the papers. However my takeaway is that STM and STMish mechanisms are used in a very small place: the message passing system's implementation. Given that message passing systems typically share information through messages :-), this ought to create a very small footprint for transaction conflicts.
the transaction module approach already seems to require to chunk up programs in semi-small transactions, that may cause other transactions to be scheduled
which seems more and more like twisted's defereds. to my eyes twisted style code is a kind of spaghetti that is very hard to reason about.
which is why i want to experiment executing multiple longer sequential programs in chunks that may be interleaved and/or parallel
As I understand it, a properly functioning STM implementation is going to make transactions appear that they are serialised. As for interleaving. That is happening below the hood and is an efficiency and implementation issue. One may use the scheduler to directly enforce serialisation. (I have to review my notes for how STM aware is the Haskell runtime scheduler). Cheers, Andrew

On 10/01/2012 08:32 PM, Andrew Francis wrote:
Hi Ronny:
------------------------------------------------------------------------ *From:* Ronny Pfannschmidt <Ronny.Pfannschmidt@gmx.de> *To:* Armin Rigo <arigo@tunes.org> *Cc:* Sarah Mount <s.mount@wlv.ac.uk>; PyPy_Developers <pypy-dev@python.org> *Sent:* Sunday, September 30, 2012 10:43 AM *Subject:* Re: [pypy-dev] PyPy STM
the reason why i start with generators instead of green-lets is simply cause they cannot ever be nested.
I would hold the reverse view since to the best of my knowledge, generators don't lend themselves to composability. I suspect greenlets (or continuelets?) are easier to work with.
practically there is no semantic difference between yielding a request for completion of "another process" and calling into a function its practically equivalent, the "calls" just look different
i think for most purposes simple sequential communicating programs are way more easy to reason about than anything else.
That is why I use Stackless Python.... And I would love to try to incorporate STM into Stackless.
no point in pushing that into old stackless once continulet-jit-3 works as we hope to, "continuations" will have distinct stacks (Armin please correct if i missed )
I recently read a book by John Reppy on Concurrent ML. In it, Reppy gives the theoretical reasons about the virtues of synchronous channels with buffers.
channels with buffers will be one of my primitives
You may want to look at John Reppy's paper "Parallel Concurrent ML." ( http://people.cs.uchicago.edu/~jhr/papers/2009/icfp-parallel-cml.pdf). Under the hood, Reppy and company use optimistic locking to implement efficient message passing on a multicore system. Another paper you should check out is "Scalable Join Patterns" by Russo and Turon ( I exchanged e-mails with these guys).
I'll admit, I don't understand all the ins-and-outs of the papers. However my takeaway is that STM and STMish mechanisms are used in a very small place: the message passing system's implementation. Given that message passing systems typically share information through messages :-), this ought to create a very small footprint for transaction conflicts.
stm will be everywhere in pypy because we need it everywhere to have it appear as if we still have the gil
the transaction module approach already seems to require to chunk up programs in semi-small transactions, that may cause other transactions to be scheduled
which seems more and more like twisted's defereds. to my eyes twisted style code is a kind of spaghetti that is very hard to reason about.
which is why i want to experiment executing multiple longer sequential programs in chunks that may be interleaved and/or parallel
As I understand it, a properly functioning STM implementation is going to make transactions appear that they are serialised. As for interleaving. That is happening below the hood and is an efficiency and implementation issue. One may use the scheduler to directly enforce serialisation. (I have to review my notes for how STM aware is the Haskell runtime scheduler).
see Armin's latest blog entry on stm its no longer completely under the hood, but the transaction module will be a convenient interface to it for some kinds of program the base primitives will be thread.atomic/thread.exclusive_atomic my experiment will touch it from a different angle -- Ronny

Hi Ronny: ________________________________ From: Ronny Pfannschmidt <Ronny.Pfannschmidt@gmx.de> To: Andrew Francis <andrewfr_ice@yahoo.com> Cc: Armin Rigo <arigo@tunes.org>; Sarah Mount <s.mount@wlv.ac.uk>; PyPy_Developers <pypy-dev@python.org> Sent: Monday, October 1, 2012 2:51 PM Subject: Re: [pypy-dev] PyPy STM On 10/01/2012 08:32 PM, Andrew Francis wrote: AF> That is why I use Stackless Python.... And I would love to try to AF> incorporate STM into Stackless.
no point in pushing that into old stackless once continulet-jit-3 works as we hope to, "continuations" will have distinct stacks (Armin please correct if i missed )
I view Stackless Python and its API as functioning at a higher level than continuelets or some other low level concurrency mechanism. As an example, look at the old stackless.py module that supported both greenlets and coroutines. Maybe Stackless a few years from now, will be Stackless in API only rather than under the hood mechanisms? AF> I'll admit, I don't understand all the ins-and-outs of the papers. AF> However my takeaway is that STM and STMish mechanisms are used in a very AF> small place: the message passing system's implementation. Given that AF> message passing systems typically share information through messages AF> :-), this ought to create a very small footprint for transaction conflicts.
stm will be everywhere in pypy because we need it everywhere to have it appear as if we still have the gil
Yes the STM mechanism will be everywhere. However I would suspect that one would still want to write programmes that would result in fewer transaction conflicts and redo/undo work. It would be nice to have programming constructs to do this (I guess atomic is one of them). Message passing is another mechanism. Message passing is a fairly easy model to understand. Again, my point is that the literature points to using STM and STM like features to build more efficient message passing systems. It is this line of thinking that I feel is worth pursuing. I don't know how the PyCSP folks feel about this? If you want to see an example of the headaches of using fine-grained concurrency control to implement channels, look at Go source code (i.e, you will see stuff like sorting locks on channel data structures). Cheers, Andrew

On 10/04/2012 06:14 PM, Andrew Francis wrote:
Hi Ronny:
------------------------------------------------------------------------ *From:* Ronny Pfannschmidt <Ronny.Pfannschmidt@gmx.de> *To:* Andrew Francis <andrewfr_ice@yahoo.com> *Cc:* Armin Rigo <arigo@tunes.org>; Sarah Mount <s.mount@wlv.ac.uk>; PyPy_Developers <pypy-dev@python.org> *Sent:* Monday, October 1, 2012 2:51 PM *Subject:* Re: [pypy-dev] PyPy STM
On 10/01/2012 08:32 PM, Andrew Francis wrote:
AF> That is why I use Stackless Python.... And I would love to try to AF> incorporate STM into Stackless.
no point in pushing that into old stackless once continulet-jit-3 works as we hope to, "continuations" will have distinct stacks (Armin please correct if i missed )
I view Stackless Python and its API as functioning at a higher level than continuelets or some other low level concurrency mechanism. As an example, look at the old stackless.py module that supported both greenlets and coroutines.
Maybe Stackless a few years from now, will be Stackless in API only rather than under the hood mechanisms?
as far as i understood, that will be the state once the continulet-jit-3 branch is done it just gives each continuation a own stack, the rest is api
AF> I'll admit, I don't understand all the ins-and-outs of the papers. AF> However my takeaway is that STM and STMish mechanisms are used in a very AF> small place: the message passing system's implementation. Given that AF> message passing systems typically share information through messages AF> :-), this ought to create a very small footprint for transaction conflicts.
stm will be everywhere in pypy because we need it everywhere to have it appear as if we still have the gil
Yes the STM mechanism will be everywhere. However I would suspect that one would still want to write programmes that would result in fewer transaction conflicts and redo/undo work. It would be nice to have programming constructs to do this (I guess atomic is one of them).Message passing is another mechanism. Message passing is a fairly easy model to understand.
i think that some fundamentals (aka channels/queues) will be enough the rest is a question of the programming models/patterns it will be interesting to figure good/new ones
Again, my point is that the literature points to using STM and STM like features to build more efficient message passing systems. It is this line of thinking that I feel is worth pursuing. I don't know how the PyCSP folks feel about this?
im under the impression that messages are to be used to help with low conflict rates, but [[citation needed]] i haven’t yet experimented with that enough
If you want to see an example of the headaches of using fine-grained concurrency control to implement channels, look at Go source code (i.e, you will see stuff like sorting locks on channel data structures).
noted for in 2 months -- Ronny
Cheers, Andrew

On 09/30/2012 04:22 PM Armin Rigo wrote:
Hi Ronny,
On Sun, Sep 30, 2012 at 3:50 PM, Ronny Pfannschmidt <Ronny.Pfannschmidt@gmx.de> wrote:
after my thesis i'll be experimenting with a relaxed csp-ish model based on python native generator based continuations as well as the new continulet-jit-3 based greenlets.
my basic assumption is that having limited amount of shared memory is acceptable.
What you are thinking about is to start from the naturally multicore model of separate address spaces, and add some amount of shared memory. You would use STM to handle the result. It is the opposite of what I'm thinking about, which is to start with a non-multithread, non-tasklet-based program and add multicore capability to it. I would be using STM to "create" multicore capability, whereas you would be using it to "create" shared memory. I am more interested in the first approach than the second because I think it is closer to what untrained programmers start with, but both approaches are potentially valid.
Russel: STM is a powerful tool that makes sense of shared memory in multicore situations. I fail to understand why you are looking at it in the absence of shared memory...
A bientôt,
Armin.
Hi Armin, Just a triggered thought: I am wondering if Conway's Game of Life http://en.wikipedia.org/wiki/Conway%27s_Game_of_Life might be be an interesting/fun basis for experiments and maybe a benchmark for STM use in the parallel update of the life grid. I am thinking of in-place update, since old-frame -> new-frame would not create any conflicts. A naive single-thread in-place update would obviously require local neighbor "old-frame" values to calculate correct updates, so a programmer would naturally have to provide for that somehow. What kinds of design patterns wrt STM would emerge from solving the problem in various loops that create different interferences when parallelized, e.g., looping by rows vs looping recursively through tiles and subtiles, vs maybe using a quadtree representation or whatever? I am wondering how you would visualize STM helping in bringing either manual or automated parallelism into this update problem, say with a goal of maximizing frame rate. And what do you expect the "untrained programmers" to think of, and to be able to avoid thinking about, because of STM, in their designs. Regards, Bengt Richter

Hi Bengt, On Tue, Oct 2, 2012 at 1:40 PM, Bengt Richter <bokr@oz.net> wrote:
Just a triggered thought: I am wondering if Conway's Game of Life http://en.wikipedia.org/wiki/Conway%27s_Game_of_Life might be be an interesting/fun basis for experiments and maybe a benchmark for STM use in the parallel update of the life grid.
I am thinking of in-place update, since old-frame -> new-frame would not create any conflicts.
I suppose the boring answer is that old-frame -> new-frame looks more natural. Even if not, I can think about different answers, but it doesn't seem that STM as I think about it is really related to them. STM in PyPy is merely an "implementation detail" to speed up a GIL-like user experience. Armin

On Thu, Oct 4, 2012 at 5:53 AM, Armin Rigo <arigo@tunes.org> wrote:
Hi Bengt,
On Tue, Oct 2, 2012 at 1:40 PM, Bengt Richter <bokr@oz.net> wrote:
Just a triggered thought: I am wondering if Conway's Game of Life http://en.wikipedia.org/wiki/Conway%27s_Game_of_Life might be be an interesting/fun basis for experiments and maybe a benchmark for STM use in the parallel update of the life grid.
I am thinking of in-place update, since old-frame -> new-frame would not create any conflicts.
I suppose the boring answer is that old-frame -> new-frame looks more natural. Even if not, I can think about different answers, but it doesn't seem that STM as I think about it is really related to them. STM in PyPy is merely an "implementation detail" to speed up a GIL-like user experience.
In other words, the GIL is a point of contention even when the application code is not sharing data between threads. Correct? If we're looking for benchmark problems which are not trivially parallelized, STAMP[0] is probably a great place to start. [0] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.139.472

Hi, On Thu, Oct 4, 2012 at 7:58 PM, Randall Leeds <randall.leeds@gmail.com> wrote:
If we're looking for benchmark problems which are not trivially parallelized, STAMP[0] is probably a great place to start.
[0] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.139.472
You're welcome to contribute by rewriting them to Python. As long as they are in C, of course, they are not immediately useful to us. A bientôt, Armin.
participants (6)
-
Andrew Francis
-
Armin Rigo
-
Bengt Richter
-
Randall Leeds
-
Ronny Pfannschmidt
-
Russel Winder