Mailman 3 Question about stm_descriptor_init(), tasklets and OS threads - pypy-dev

Armin Rigo

April 2012

8:19 a.m.

Hi Andrew, On Tue, Apr 17, 2012 at 00:44, Andrew Francis <andrewfr_ice@yahoo.com> wrote:

I don't understand why at all, sorry. I will stick to my position that the Stackless module should be modified to use the transaction module internally, and that no editing of the low-level RPython and C code is necessary. It is possible that using the transaction module in pure Python from lib_pypy/stackless.py is not really working, in which case you may need to edit pypy/module/_continuation instead and call directly pypy.rlib.rstm in RPython. But you definitely don't need to edit anything at a lower level. A bientôt, Armin.

Reply

Sign in to reply online Use email software

Armin Rigo

7:37 p.m.

Hi Andrew, On Thu, Apr 19, 2012 at 19:43, Andrew Francis <andrewfr_ice@yahoo.com> wrote:

...

I am trying to understand enough to get into a position to attempt an integration.

I believe you are trying to approach the problem from the bottom-most level up --- which is a fine way to approach problems; but in this case, you are missing that there are still a few levels between were you got so far and the destination, which is the stackless.py module in pure Python. Let us try to approach it top-down instead, because there are far less levels to dig through. The plan is to make any existing stackless-using program work on multiple cores. Take a random existing stackless example, and start to work by editing the pure Python lib_pypy/stackless.py (or, at first, writing a new version from scratch, copying parts of the original). The idea is to use the "transaction" module. The goal would be to not use the _squeue, which is a deque of pending tasklets, but instead to add pending tasklets with transaction.add(callback). Note how the notion of tasklets in the _squeue (which offers some specific order) is replaced by the notion of which callbacks have been added. See below for what is in each callback. The transaction.run() would occur in the main program, directly called by stackless.schedule(). So the callbacks are all invoked in the main program too; in fact all the "transaction" dispatching is done in the main program, and only scheduling new tasklets can occur anywhere. The callbacks would just contain a switch to the tasklet. When the tasklet comes back, if it is not finished, re-add() it. This is all. You have to make sure that all tasklet.switch()es internally go back to the main program, and not directly to another tasklet. This should ensure that the duration of every transaction is exactly the time in a tasklet between two calls to switch(). Of course, this can all be written and tested against "transaction.py", the pure Python emulator. Once it is nicely working, you are done. You just have to wait for a continuation-capable version of pypy-stm; running the same program on it, you'll get multi-core usage. A bientôt, Armin.

Reply

Sign in to reply online Use email software

Andrew Francis

May 2012

3:50 p.m.

Hi Armin: Thanks for the explanation. Sorry for taking so long to respond. This is a challenging post. Comments below: ________________________________ From: Armin Rigo <arigo@tunes.org> To: Andrew Francis <andrewfr_ice@yahoo.com> Cc: Py Py Developer Mailing List <pypy-dev@python.org> Sent: Thursday, April 19, 2012 3:37 PM Subject: Re: Question about stm_descriptor_init(), tasklets and OS threads Hi Andrew, On Thu, Apr 19, 2012 at 19:43, Andrew Francis <andrewfr_ice@yahoo.com> wrote: AF> I am trying to understand enough to get into a position to attempt an AF> integration.

...

I believe you are trying to approach the problem from the bottom-most level up --- which is a fine way to approach problems; but in this case, you are missing that there are still a few levels between were you got so far and the destination, which is the stackless.py module in pure Python.

...

Let us try to approach it top-down instead, because there are far less levels to dig through. The plan is to make any existing stackless-using program work on multiple cores. Take a random existing stackless example, and start to work by editing the

Yes Armin, you are right: I am not treating the transaction module as a black-box. I'll try to think more abstractly. pure

...

Python lib_pypy/stackless.py (or, at first, writing a new version from scratch, copying parts of the original). The idea is to use the "transaction" module. The goal would be to not use the _squeue, which is a deque of pending tasklets, but instead to add pending tasklets with transaction.add(callback). Note how the notion of tasklets in the _squeue (which offers some specific order) is replaced by the notion of which callbacks have been added. See below for what is in each callback.

...

The transaction.run() would occur in the main program, directly called by stackless.schedule(). So the callbacks are all invoked in the main program too; in fact all the "transaction" dispatching is done in the main program, and only scheduling new tasklets can occur anywhere.

I see the transaction module and the scheduler's operation to be orthogonal. It has been my contention that the Stackless scheduler (at least in cooperative mode) is a rather dumb mechanism. The scheduler is dumb in the sense that there really isn't much scheduling priority smarts in it. Rather scheduling/priority decisions resides in the tasklet and the channels. (Some side notes: 1) in the original Plan9 literature, the scheduler is opaque. 2) View the scheduler as being re-entrant rather than an distinct process.) This division actually makes Stackless quite flexible. For instance, when I implemented select(), a new mechanism was introduced: a chanop (this comes straight out of the Plan9/Go implementation). Later, chanops were generalized into guards. Guards serve as the building blocks complex mechanisms such as channels and join patterns. The scheduler was not touched. Also join patterns implement a trivial form of all-or-nothing semantics (but there is no retry). Based on what you have said, I strongly suspect the transaction module could be integrated into Stackless (via stackless.py) in a similar fashion. In short, one does not touch the scheduler. However the transaction module would use the scheduler.

...

The callbacks would just contain a switch to the tasklet. When the tasklet comes back, if it is not finished, re-add() it. This is all.

...

You have to make sure that all tasklet.switch()es internally go back to the main program, and not directly to another tasklet. This should ensure

Side note - in a way, this is how the non-preemptive scheduler works. However this is C based Stackless Python. that the duration of every transaction is exactly the time in a

...

tasklet between two calls to switch().

I am having problems understanding this passage. Perhaps I view transactions and context switching as orthogonal. Yes the transaction module's control of context switching could be a part of a strategy for ensuring atomicity. Yes context switching creates opportunities for conflict/contention (this is what causes race conditions). However SME schemes can handle OS context switches.

...

You have to make sure that all tasklet.switch()es internally go back to the main program, and not directly to another tasklet.

...

This should ensure

What is the main programme doing that makes it so special? Is it some sort of synchronizer (a la Nancy Lynch's IO Automata model)? As a side note, I recall reading in previous posts that the underlying ability to jump directly to a tasklet makes pickling difficult. Also doesn't a main programme sort of suggests a generator/trampoline approach to scheduling that wasn't necessary with tasklets and tasklets running over greenlets? Maybe it is the way yield() is used in AME that causing me a bit of mental confusion? that the duration of every transaction is exactly the time in a

...

tasklet between two calls to switch().

This implies that context switches demarcate transaction boundaries. Context switches in Stackless are very common. For instance channel operations often trigger a context switch. Programmes voluntarily call schedule() to give other tasklets a chance to execute. So in the case of: def transfer(toAccount, fromAccount, amount): # start of transaction fromAccount.withdraw(amount) stackless.schedule() toAccount.deposit(amount) #finish transaction What would happen? In a normal Stackless Python programme, all other things being equal, the stackless.schedule() opens up the door for race conditions and contention. Remove the stackless.schedule() and that tasklet would operate very much like a transaction. I would suspect that if there was a transaction module present, in case of contention, it would about and re-try transfer(). Am I right in assuming this? if we could imagine transfer() under the hood doing something like: def transfer(toAccount, fromAccount, amount): transaction.start() # NOTE THIS IS INVISIBLE TO THE PROGRAMMER!!! it would not be difficult for the transaction module, if it sees another transaction in progress that would cause contention, to block transfer() by calling say, schedule_remove() and readding transfer() when appropriate.

...

Of course, this can all be written and tested against "transaction.py", the pure Python emulator. Once it is nicely working, you are done. You just have to wait for a continuation-capable version of pypy-stm; running the same program on it, you'll get multi-core usage.

Once again, thank you for your response. I may be still unsure of some aspects. However I believe integration ought to be much cleaner - no hacking/replacing the scheduler. Cheers, Andrew

Reply

Sign in to reply online Use email software

Armin Rigo

April 2012

8:19 a.m.

Hi Andrew, On Tue, Apr 17, 2012 at 00:44, Andrew Francis <andrewfr_ice@yahoo.com> wrote:

...

I don't understand why at all, sorry. I will stick to my position that the Stackless module should be modified to use the transaction module internally, and that no editing of the low-level RPython and C code is necessary. It is possible that using the transaction module in pure Python from lib_pypy/stackless.py is not really working, in which case you may need to edit pypy/module/_continuation instead and call directly pypy.rlib.rstm in RPython. But you definitely don't need to edit anything at a lower level. A bientôt, Armin.

Reply

Sign in to reply online Use email software

Armin Rigo

7:37 p.m.

Hi Andrew, On Thu, Apr 19, 2012 at 19:43, Andrew Francis <andrewfr_ice@yahoo.com> wrote:

...

I am trying to understand enough to get into a position to attempt an integration.

I believe you are trying to approach the problem from the bottom-most level up --- which is a fine way to approach problems; but in this case, you are missing that there are still a few levels between were you got so far and the destination, which is the stackless.py module in pure Python. Let us try to approach it top-down instead, because there are far less levels to dig through. The plan is to make any existing stackless-using program work on multiple cores. Take a random existing stackless example, and start to work by editing the pure Python lib_pypy/stackless.py (or, at first, writing a new version from scratch, copying parts of the original). The idea is to use the "transaction" module. The goal would be to not use the _squeue, which is a deque of pending tasklets, but instead to add pending tasklets with transaction.add(callback). Note how the notion of tasklets in the _squeue (which offers some specific order) is replaced by the notion of which callbacks have been added. See below for what is in each callback. The transaction.run() would occur in the main program, directly called by stackless.schedule(). So the callbacks are all invoked in the main program too; in fact all the "transaction" dispatching is done in the main program, and only scheduling new tasklets can occur anywhere. The callbacks would just contain a switch to the tasklet. When the tasklet comes back, if it is not finished, re-add() it. This is all. You have to make sure that all tasklet.switch()es internally go back to the main program, and not directly to another tasklet. This should ensure that the duration of every transaction is exactly the time in a tasklet between two calls to switch(). Of course, this can all be written and tested against "transaction.py", the pure Python emulator. Once it is nicely working, you are done. You just have to wait for a continuation-capable version of pypy-stm; running the same program on it, you'll get multi-core usage. A bientôt, Armin.

Reply

Sign in to reply online Use email software

Andrew Francis

May 2012

3:50 p.m.

Hi Armin: Thanks for the explanation. Sorry for taking so long to respond. This is a challenging post. Comments below: ________________________________ From: Armin Rigo <arigo@tunes.org> To: Andrew Francis <andrewfr_ice@yahoo.com> Cc: Py Py Developer Mailing List <pypy-dev@python.org> Sent: Thursday, April 19, 2012 3:37 PM Subject: Re: Question about stm_descriptor_init(), tasklets and OS threads Hi Andrew, On Thu, Apr 19, 2012 at 19:43, Andrew Francis <andrewfr_ice@yahoo.com> wrote: AF> I am trying to understand enough to get into a position to attempt an AF> integration.

...

I believe you are trying to approach the problem from the bottom-most level up --- which is a fine way to approach problems; but in this case, you are missing that there are still a few levels between were you got so far and the destination, which is the stackless.py module in pure Python.

...

Let us try to approach it top-down instead, because there are far less levels to dig through. The plan is to make any existing stackless-using program work on multiple cores. Take a random existing stackless example, and start to work by editing the

Yes Armin, you are right: I am not treating the transaction module as a black-box. I'll try to think more abstractly. pure

...

Python lib_pypy/stackless.py (or, at first, writing a new version from scratch, copying parts of the original). The idea is to use the "transaction" module. The goal would be to not use the _squeue, which is a deque of pending tasklets, but instead to add pending tasklets with transaction.add(callback). Note how the notion of tasklets in the _squeue (which offers some specific order) is replaced by the notion of which callbacks have been added. See below for what is in each callback.

...

The transaction.run() would occur in the main program, directly called by stackless.schedule(). So the callbacks are all invoked in the main program too; in fact all the "transaction" dispatching is done in the main program, and only scheduling new tasklets can occur anywhere.

I see the transaction module and the scheduler's operation to be orthogonal. It has been my contention that the Stackless scheduler (at least in cooperative mode) is a rather dumb mechanism. The scheduler is dumb in the sense that there really isn't much scheduling priority smarts in it. Rather scheduling/priority decisions resides in the tasklet and the channels. (Some side notes: 1) in the original Plan9 literature, the scheduler is opaque. 2) View the scheduler as being re-entrant rather than an distinct process.) This division actually makes Stackless quite flexible. For instance, when I implemented select(), a new mechanism was introduced: a chanop (this comes straight out of the Plan9/Go implementation). Later, chanops were generalized into guards. Guards serve as the building blocks complex mechanisms such as channels and join patterns. The scheduler was not touched. Also join patterns implement a trivial form of all-or-nothing semantics (but there is no retry). Based on what you have said, I strongly suspect the transaction module could be integrated into Stackless (via stackless.py) in a similar fashion. In short, one does not touch the scheduler. However the transaction module would use the scheduler.

...

The callbacks would just contain a switch to the tasklet. When the tasklet comes back, if it is not finished, re-add() it. This is all.

...

You have to make sure that all tasklet.switch()es internally go back to the main program, and not directly to another tasklet. This should ensure

Side note - in a way, this is how the non-preemptive scheduler works. However this is C based Stackless Python. that the duration of every transaction is exactly the time in a

...

tasklet between two calls to switch().

I am having problems understanding this passage. Perhaps I view transactions and context switching as orthogonal. Yes the transaction module's control of context switching could be a part of a strategy for ensuring atomicity. Yes context switching creates opportunities for conflict/contention (this is what causes race conditions). However SME schemes can handle OS context switches.

...

You have to make sure that all tasklet.switch()es internally go back to the main program, and not directly to another tasklet.

...

This should ensure

What is the main programme doing that makes it so special? Is it some sort of synchronizer (a la Nancy Lynch's IO Automata model)? As a side note, I recall reading in previous posts that the underlying ability to jump directly to a tasklet makes pickling difficult. Also doesn't a main programme sort of suggests a generator/trampoline approach to scheduling that wasn't necessary with tasklets and tasklets running over greenlets? Maybe it is the way yield() is used in AME that causing me a bit of mental confusion? that the duration of every transaction is exactly the time in a

...

tasklet between two calls to switch().

This implies that context switches demarcate transaction boundaries. Context switches in Stackless are very common. For instance channel operations often trigger a context switch. Programmes voluntarily call schedule() to give other tasklets a chance to execute. So in the case of: def transfer(toAccount, fromAccount, amount): # start of transaction fromAccount.withdraw(amount) stackless.schedule() toAccount.deposit(amount) #finish transaction What would happen? In a normal Stackless Python programme, all other things being equal, the stackless.schedule() opens up the door for race conditions and contention. Remove the stackless.schedule() and that tasklet would operate very much like a transaction. I would suspect that if there was a transaction module present, in case of contention, it would about and re-try transfer(). Am I right in assuming this? if we could imagine transfer() under the hood doing something like: def transfer(toAccount, fromAccount, amount): transaction.start() # NOTE THIS IS INVISIBLE TO THE PROGRAMMER!!! it would not be difficult for the transaction module, if it sees another transaction in progress that would cause contention, to block transfer() by calling say, schedule_remove() and readding transfer() when appropriate.

...

Of course, this can all be written and tested against "transaction.py", the pure Python emulator. Once it is nicely working, you are done. You just have to wait for a continuation-capable version of pypy-stm; running the same program on it, you'll get multi-core usage.

Once again, thank you for your response. I may be still unsure of some aspects. However I believe integration ought to be much cleaner - no hacking/replacing the scheduler. Cheers, Andrew

Reply

Sign in to reply online Use email software

Question about stm_descriptor_init(), tasklets and OS threads

Andrew Francis

Armin Rigo

Andrew Francis

Armin Rigo

Armin Rigo

Andrew Francis

Armin Rigo

Armin Rigo

Andrew Francis

Armin Rigo

Armin Rigo

Andrew Francis

Armin Rigo

tags

participants (2)