Question about stm_descriptor_init(), tasklets and OS threads

Hi Armin: I am looking at stm_descriptor_init(). Right now makes a call to pthread_self(). In a potential Stackless prototype, I would want it to get the current tasklet instead. Shouldn't this be enough to get a trivial implementation of Stackless (by trivial, one thread. Hopefully by sticking to a single thread we don't have to alter any low level locking stuff) to interact with the transaction module. Enough to analyse programmes? Cheers, Andrew

Hi Andrew, On Tue, Apr 17, 2012 at 00:44, Andrew Francis <andrewfr_ice@yahoo.com> wrote:
I don't understand why at all, sorry. I will stick to my position that the Stackless module should be modified to use the transaction module internally, and that no editing of the low-level RPython and C code is necessary. It is possible that using the transaction module in pure Python from lib_pypy/stackless.py is not really working, in which case you may need to edit pypy/module/_continuation instead and call directly pypy.rlib.rstm in RPython. But you definitely don't need to edit anything at a lower level. A bientôt, Armin.

Hi Armin: ________________________________ From: Armin Rigo <arigo@tunes.org> To: Andrew Francis <andrewfr_ice@yahoo.com> Cc: Py Py Developer Mailing List <pypy-dev@python.org> Sent: Tuesday, April 17, 2012 4:19 AM Subject: Re: Question about stm_descriptor_init(), tasklets and OS threads
I don't understand why at all, sorry.
Please bear with me :-). I am in the same position now as I was in 2007 when I was trying to make Stackless interoperate with Twisted. A lot of silly questions. A lot of misconceptions. A lot of looking at code to see how things worked. And some dusting off the old Operating System text books.
Noted. Again, when you write a position paper, this would be listed as a fundamental design principle.
I am trying to understand enough to get into a position to attempt an integration. I will start with a Stackless bank account programme (I have written a version in RPython). A very simple programme to write. To make things interesting, my plan is to make one tasklet call schedule() hence causing a contention. It is important to note there is only one OS thread in action. However what is not clear to me and you are in a better position to answer is whether the underlying low-level rstm machinery cares that it is user space tasklets, not OS threads that are the units-of- execution that are causing a contention. Cheers, Andrew

Hi Andrew, On Thu, Apr 19, 2012 at 19:43, Andrew Francis <andrewfr_ice@yahoo.com> wrote:
I am trying to understand enough to get into a position to attempt an integration.
I believe you are trying to approach the problem from the bottom-most level up --- which is a fine way to approach problems; but in this case, you are missing that there are still a few levels between were you got so far and the destination, which is the stackless.py module in pure Python. Let us try to approach it top-down instead, because there are far less levels to dig through. The plan is to make any existing stackless-using program work on multiple cores. Take a random existing stackless example, and start to work by editing the pure Python lib_pypy/stackless.py (or, at first, writing a new version from scratch, copying parts of the original). The idea is to use the "transaction" module. The goal would be to not use the _squeue, which is a deque of pending tasklets, but instead to add pending tasklets with transaction.add(callback). Note how the notion of tasklets in the _squeue (which offers some specific order) is replaced by the notion of which callbacks have been added. See below for what is in each callback. The transaction.run() would occur in the main program, directly called by stackless.schedule(). So the callbacks are all invoked in the main program too; in fact all the "transaction" dispatching is done in the main program, and only scheduling new tasklets can occur anywhere. The callbacks would just contain a switch to the tasklet. When the tasklet comes back, if it is not finished, re-add() it. This is all. You have to make sure that all tasklet.switch()es internally go back to the main program, and not directly to another tasklet. This should ensure that the duration of every transaction is exactly the time in a tasklet between two calls to switch(). Of course, this can all be written and tested against "transaction.py", the pure Python emulator. Once it is nicely working, you are done. You just have to wait for a continuation-capable version of pypy-stm; running the same program on it, you'll get multi-core usage. A bientôt, Armin.

Re-Hi, On Thu, Apr 19, 2012 at 21:37, Armin Rigo <arigo@tunes.org> wrote:
You have to make sure that all tasklet.switch()es internally go back to the main program, and not directly to another tasklet.
Ah, sorry, I confused the stackless interface. You don't switch() to tasklets, but instead call send() and receive() on channels. It's basically the same: whenever we call either send() or receive() on a channel, we internally switch back to the main program, i.e. back into the transaction's callback. This one needs to figure out, depending on the channel state and the operation we do, if the same tasklet can continue to run now or not --- which is done with transaction.add(callback-continuing-the-same-tasklet) --- and also if another blocked tasklet can now proceed --- which is done with transaction.add(callback-continuing-the-other-tasklet). So depending on the cases it will add() zero, one or two more transactions, and then finish. A bientôt, Armin.

Hi Armin: Thanks for the explanation. Sorry for taking so long to respond. This is a challenging post. Comments below: ________________________________ From: Armin Rigo <arigo@tunes.org> To: Andrew Francis <andrewfr_ice@yahoo.com> Cc: Py Py Developer Mailing List <pypy-dev@python.org> Sent: Thursday, April 19, 2012 3:37 PM Subject: Re: Question about stm_descriptor_init(), tasklets and OS threads Hi Andrew, On Thu, Apr 19, 2012 at 19:43, Andrew Francis <andrewfr_ice@yahoo.com> wrote: AF> I am trying to understand enough to get into a position to attempt an AF> integration.
Yes Armin, you are right: I am not treating the transaction module as a black-box. I'll try to think more abstractly. pure
I see the transaction module and the scheduler's operation to be orthogonal. It has been my contention that the Stackless scheduler (at least in cooperative mode) is a rather dumb mechanism. The scheduler is dumb in the sense that there really isn't much scheduling priority smarts in it. Rather scheduling/priority decisions resides in the tasklet and the channels. (Some side notes: 1) in the original Plan9 literature, the scheduler is opaque. 2) View the scheduler as being re-entrant rather than an distinct process.) This division actually makes Stackless quite flexible. For instance, when I implemented select(), a new mechanism was introduced: a chanop (this comes straight out of the Plan9/Go implementation). Later, chanops were generalized into guards. Guards serve as the building blocks complex mechanisms such as channels and join patterns. The scheduler was not touched. Also join patterns implement a trivial form of all-or-nothing semantics (but there is no retry). Based on what you have said, I strongly suspect the transaction module could be integrated into Stackless (via stackless.py) in a similar fashion. In short, one does not touch the scheduler. However the transaction module would use the scheduler.
The callbacks would just contain a switch to the tasklet. When the tasklet comes back, if it is not finished, re-add() it. This is all.
Side note - in a way, this is how the non-preemptive scheduler works. However this is C based Stackless Python. that the duration of every transaction is exactly the time in a
tasklet between two calls to switch().
I am having problems understanding this passage. Perhaps I view transactions and context switching as orthogonal. Yes the transaction module's control of context switching could be a part of a strategy for ensuring atomicity. Yes context switching creates opportunities for conflict/contention (this is what causes race conditions). However SME schemes can handle OS context switches.
You have to make sure that all tasklet.switch()es internally go back to the main program, and not directly to another tasklet.
This should ensure
What is the main programme doing that makes it so special? Is it some sort of synchronizer (a la Nancy Lynch's IO Automata model)? As a side note, I recall reading in previous posts that the underlying ability to jump directly to a tasklet makes pickling difficult. Also doesn't a main programme sort of suggests a generator/trampoline approach to scheduling that wasn't necessary with tasklets and tasklets running over greenlets? Maybe it is the way yield() is used in AME that causing me a bit of mental confusion? that the duration of every transaction is exactly the time in a
tasklet between two calls to switch().
This implies that context switches demarcate transaction boundaries. Context switches in Stackless are very common. For instance channel operations often trigger a context switch. Programmes voluntarily call schedule() to give other tasklets a chance to execute. So in the case of: def transfer(toAccount, fromAccount, amount): # start of transaction fromAccount.withdraw(amount) stackless.schedule() toAccount.deposit(amount) #finish transaction What would happen? In a normal Stackless Python programme, all other things being equal, the stackless.schedule() opens up the door for race conditions and contention. Remove the stackless.schedule() and that tasklet would operate very much like a transaction. I would suspect that if there was a transaction module present, in case of contention, it would about and re-try transfer(). Am I right in assuming this? if we could imagine transfer() under the hood doing something like: def transfer(toAccount, fromAccount, amount): transaction.start() # NOTE THIS IS INVISIBLE TO THE PROGRAMMER!!! it would not be difficult for the transaction module, if it sees another transaction in progress that would cause contention, to block transfer() by calling say, schedule_remove() and readding transfer() when appropriate.
Once again, thank you for your response. I may be still unsure of some aspects. However I believe integration ought to be much cleaner - no hacking/replacing the scheduler. Cheers, Andrew

Hi Andrew, Please look at the latest documentation: https://bitbucket.org/pypy/pypy/raw/stm-thread/pypy/doc/stm.rst You should be able to use such a "thread.atomic" in stackless.py. You need to create N threads and run the tasklets in these threads. As long as each tasklet's user code is protected by a "thread.atomic", then they will *appear* to be run serially. You probably need to call "thread.atomic.__enter__" and "__exit__" explicitly for your use case; if you do, then I could also move the functionality as normal built-in method. You also have to handle issues like tasklets not being always allowed to switch threads. As a first approximation, on CPython you can implement a dummy "thread.atomic" by acquiring and releasing a single lock. It is only an approximative equivalent because other non-atomic threads will be allowed to run concurrently; but for this kind of experiment where *all* threads should be "atomic", it should not make a difference. A bientôt, Armin.

Hi Andrew, On Tue, Apr 17, 2012 at 00:44, Andrew Francis <andrewfr_ice@yahoo.com> wrote:
I don't understand why at all, sorry. I will stick to my position that the Stackless module should be modified to use the transaction module internally, and that no editing of the low-level RPython and C code is necessary. It is possible that using the transaction module in pure Python from lib_pypy/stackless.py is not really working, in which case you may need to edit pypy/module/_continuation instead and call directly pypy.rlib.rstm in RPython. But you definitely don't need to edit anything at a lower level. A bientôt, Armin.

Hi Armin: ________________________________ From: Armin Rigo <arigo@tunes.org> To: Andrew Francis <andrewfr_ice@yahoo.com> Cc: Py Py Developer Mailing List <pypy-dev@python.org> Sent: Tuesday, April 17, 2012 4:19 AM Subject: Re: Question about stm_descriptor_init(), tasklets and OS threads
I don't understand why at all, sorry.
Please bear with me :-). I am in the same position now as I was in 2007 when I was trying to make Stackless interoperate with Twisted. A lot of silly questions. A lot of misconceptions. A lot of looking at code to see how things worked. And some dusting off the old Operating System text books.
Noted. Again, when you write a position paper, this would be listed as a fundamental design principle.
I am trying to understand enough to get into a position to attempt an integration. I will start with a Stackless bank account programme (I have written a version in RPython). A very simple programme to write. To make things interesting, my plan is to make one tasklet call schedule() hence causing a contention. It is important to note there is only one OS thread in action. However what is not clear to me and you are in a better position to answer is whether the underlying low-level rstm machinery cares that it is user space tasklets, not OS threads that are the units-of- execution that are causing a contention. Cheers, Andrew

Hi Andrew, On Thu, Apr 19, 2012 at 19:43, Andrew Francis <andrewfr_ice@yahoo.com> wrote:
I am trying to understand enough to get into a position to attempt an integration.
I believe you are trying to approach the problem from the bottom-most level up --- which is a fine way to approach problems; but in this case, you are missing that there are still a few levels between were you got so far and the destination, which is the stackless.py module in pure Python. Let us try to approach it top-down instead, because there are far less levels to dig through. The plan is to make any existing stackless-using program work on multiple cores. Take a random existing stackless example, and start to work by editing the pure Python lib_pypy/stackless.py (or, at first, writing a new version from scratch, copying parts of the original). The idea is to use the "transaction" module. The goal would be to not use the _squeue, which is a deque of pending tasklets, but instead to add pending tasklets with transaction.add(callback). Note how the notion of tasklets in the _squeue (which offers some specific order) is replaced by the notion of which callbacks have been added. See below for what is in each callback. The transaction.run() would occur in the main program, directly called by stackless.schedule(). So the callbacks are all invoked in the main program too; in fact all the "transaction" dispatching is done in the main program, and only scheduling new tasklets can occur anywhere. The callbacks would just contain a switch to the tasklet. When the tasklet comes back, if it is not finished, re-add() it. This is all. You have to make sure that all tasklet.switch()es internally go back to the main program, and not directly to another tasklet. This should ensure that the duration of every transaction is exactly the time in a tasklet between two calls to switch(). Of course, this can all be written and tested against "transaction.py", the pure Python emulator. Once it is nicely working, you are done. You just have to wait for a continuation-capable version of pypy-stm; running the same program on it, you'll get multi-core usage. A bientôt, Armin.

Re-Hi, On Thu, Apr 19, 2012 at 21:37, Armin Rigo <arigo@tunes.org> wrote:
You have to make sure that all tasklet.switch()es internally go back to the main program, and not directly to another tasklet.
Ah, sorry, I confused the stackless interface. You don't switch() to tasklets, but instead call send() and receive() on channels. It's basically the same: whenever we call either send() or receive() on a channel, we internally switch back to the main program, i.e. back into the transaction's callback. This one needs to figure out, depending on the channel state and the operation we do, if the same tasklet can continue to run now or not --- which is done with transaction.add(callback-continuing-the-same-tasklet) --- and also if another blocked tasklet can now proceed --- which is done with transaction.add(callback-continuing-the-other-tasklet). So depending on the cases it will add() zero, one or two more transactions, and then finish. A bientôt, Armin.

Hi Armin: Thanks for the explanation. Sorry for taking so long to respond. This is a challenging post. Comments below: ________________________________ From: Armin Rigo <arigo@tunes.org> To: Andrew Francis <andrewfr_ice@yahoo.com> Cc: Py Py Developer Mailing List <pypy-dev@python.org> Sent: Thursday, April 19, 2012 3:37 PM Subject: Re: Question about stm_descriptor_init(), tasklets and OS threads Hi Andrew, On Thu, Apr 19, 2012 at 19:43, Andrew Francis <andrewfr_ice@yahoo.com> wrote: AF> I am trying to understand enough to get into a position to attempt an AF> integration.
Yes Armin, you are right: I am not treating the transaction module as a black-box. I'll try to think more abstractly. pure
I see the transaction module and the scheduler's operation to be orthogonal. It has been my contention that the Stackless scheduler (at least in cooperative mode) is a rather dumb mechanism. The scheduler is dumb in the sense that there really isn't much scheduling priority smarts in it. Rather scheduling/priority decisions resides in the tasklet and the channels. (Some side notes: 1) in the original Plan9 literature, the scheduler is opaque. 2) View the scheduler as being re-entrant rather than an distinct process.) This division actually makes Stackless quite flexible. For instance, when I implemented select(), a new mechanism was introduced: a chanop (this comes straight out of the Plan9/Go implementation). Later, chanops were generalized into guards. Guards serve as the building blocks complex mechanisms such as channels and join patterns. The scheduler was not touched. Also join patterns implement a trivial form of all-or-nothing semantics (but there is no retry). Based on what you have said, I strongly suspect the transaction module could be integrated into Stackless (via stackless.py) in a similar fashion. In short, one does not touch the scheduler. However the transaction module would use the scheduler.
The callbacks would just contain a switch to the tasklet. When the tasklet comes back, if it is not finished, re-add() it. This is all.
Side note - in a way, this is how the non-preemptive scheduler works. However this is C based Stackless Python. that the duration of every transaction is exactly the time in a
tasklet between two calls to switch().
I am having problems understanding this passage. Perhaps I view transactions and context switching as orthogonal. Yes the transaction module's control of context switching could be a part of a strategy for ensuring atomicity. Yes context switching creates opportunities for conflict/contention (this is what causes race conditions). However SME schemes can handle OS context switches.
You have to make sure that all tasklet.switch()es internally go back to the main program, and not directly to another tasklet.
This should ensure
What is the main programme doing that makes it so special? Is it some sort of synchronizer (a la Nancy Lynch's IO Automata model)? As a side note, I recall reading in previous posts that the underlying ability to jump directly to a tasklet makes pickling difficult. Also doesn't a main programme sort of suggests a generator/trampoline approach to scheduling that wasn't necessary with tasklets and tasklets running over greenlets? Maybe it is the way yield() is used in AME that causing me a bit of mental confusion? that the duration of every transaction is exactly the time in a
tasklet between two calls to switch().
This implies that context switches demarcate transaction boundaries. Context switches in Stackless are very common. For instance channel operations often trigger a context switch. Programmes voluntarily call schedule() to give other tasklets a chance to execute. So in the case of: def transfer(toAccount, fromAccount, amount): # start of transaction fromAccount.withdraw(amount) stackless.schedule() toAccount.deposit(amount) #finish transaction What would happen? In a normal Stackless Python programme, all other things being equal, the stackless.schedule() opens up the door for race conditions and contention. Remove the stackless.schedule() and that tasklet would operate very much like a transaction. I would suspect that if there was a transaction module present, in case of contention, it would about and re-try transfer(). Am I right in assuming this? if we could imagine transfer() under the hood doing something like: def transfer(toAccount, fromAccount, amount): transaction.start() # NOTE THIS IS INVISIBLE TO THE PROGRAMMER!!! it would not be difficult for the transaction module, if it sees another transaction in progress that would cause contention, to block transfer() by calling say, schedule_remove() and readding transfer() when appropriate.
Once again, thank you for your response. I may be still unsure of some aspects. However I believe integration ought to be much cleaner - no hacking/replacing the scheduler. Cheers, Andrew

Hi Andrew, Please look at the latest documentation: https://bitbucket.org/pypy/pypy/raw/stm-thread/pypy/doc/stm.rst You should be able to use such a "thread.atomic" in stackless.py. You need to create N threads and run the tasklets in these threads. As long as each tasklet's user code is protected by a "thread.atomic", then they will *appear* to be run serially. You probably need to call "thread.atomic.__enter__" and "__exit__" explicitly for your use case; if you do, then I could also move the functionality as normal built-in method. You also have to handle issues like tasklets not being always allowed to switch threads. As a first approximation, on CPython you can implement a dummy "thread.atomic" by acquiring and releasing a single lock. It is only an approximative equivalent because other non-atomic threads will be allowed to run concurrently; but for this kind of experiment where *all* threads should be "atomic", it should not make a difference. A bientôt, Armin.
participants (2)
-
Andrew Francis
-
Armin Rigo