Mailman 3 Continuations and sandboxing - pypy-dev

newer
Re: [pypy-dev] Continuations and...

Continuations and sandboxing

Nathanael D. Jones

10 Jan 2011 10 Jan '11

4:24 a.m.

Hi folks, I've got a challenging set of requirements that I think PyPy may be able to meet, but I'd like to get some feedback before I jump in. I'm developing a collaboratively edited game. It's not a MUD, but similar in concept, and also text based in nature. HogwartsLive.com and lotgd.ne <http://www.lotgd.net/home.php?>t are examples of the genre I'm going for. I believe if a game of that genre makes user modification simple and rewarding enough, it could be self-sustaining and grow quickly enough to keep users interested perpetually, like Wikipedia. It's also a fascinating computer science challenge, because it requires a LOT from a computer language. 1) Sandboxing. To allow users to make changes to the game without being able to "cheat" at it or escalate privileges. 2) Serializeable continuations. With gameplay being based on good plot and story flow, continuations are critical to allow straightforward implementation of 'workflows' that depend on user choice at every turn. 3) Tail-call elimination. By nature, players will accumulate a very large call stack. While this isn't terribly bad a first glance, the following issue combines with it to cause a very big problem: When code changes underneath a continuation, we need to determine how to resume flow. One option is annotating a checkpoint method in each code 'file'. However, if a user's call stack includes every file in the system, each change will cause them to restart. Tail-call elimination would help eliminate unneeded stack frames and minimize re-spawning. 3) Dynamic, strong typing and metaprogramming are key for keeping the API simple. 4) Dynamic code loading. Users will be able to 'branch' their own version of the world and share it with others. There may be thousands of versions of a class, and they need to be able to execute in separate sandboxes at the same time. Source code will be pulled from a Git repository or some kind of versioning database. I'm interested in knowing which of these PyPy already does, and which of them I can make it do. I appreciate your help! Nathanael Jones http://nathanaeljones.com/

Attachments:

attachment.html (text/html — 2.5 KB)

Show replies by date

Dima Tisnek

10 Jan 10 Jan

5:24 a.m.

First of all, congratulations on getting THE idea, I think user content is essential too, but I see why commercial games shy away from it An open source game could do really well! 1, yes, but not in a way you want (you want many boxes that can't touch each other, right?) 2, yes, stackless is the name/misnomer 3a, I doubt is relevant, you have to show that your workload has a substantial number of tail calls; stack depth is not that important with stackless. 3b, I imagine this is impossible in general case, how do you see this done in any language? 3 again, yes of course, and I bet you knew this already. 4 indeed, unloading old code is a bit harder as you have to make sure all references to module or function or class are lost in timely fashion. p.s. I'm not an expert on pypy yet On 9 January 2011 21:24, Nathanael D. Jones <nathanael.jones@gmail.com> wrote:

...

Hi folks, I've got a challenging set of requirements that I think PyPy may be able to meet, but I'd like to get some feedback before I jump in. I'm developing a collaboratively edited game. It's not a MUD, but similar in concept, and also text based in nature. HogwartsLive.com and lotgd.net are examples of the genre I'm going for. I believe if a game of that genre makes user modification simple and rewarding enough, it could be self-sustaining and grow quickly enough to keep users interested perpetually, like Wikipedia. It's also a fascinating computer science challenge, because it requires a LOT from a computer language. 1) Sandboxing. To allow users to make changes to the game without being able to "cheat" at it or escalate privileges. 2) Serializeable continuations. With gameplay being based on good plot and story flow, continuations are critical to allow straightforward implementation of 'workflows' that depend on user choice at every turn. 3) Tail-call elimination. By nature, players will accumulate a very large call stack. While this isn't terribly bad a first glance, the following issue combines with it to cause a very big problem: When code changes underneath a continuation, we need to determine how to resume flow. One option is annotating a checkpoint method in each code 'file'. However, if a user's call stack includes every file in the system, each change will cause them to restart. Tail-call elimination would help eliminate unneeded stack frames and minimize re-spawning. 3) Dynamic, strong typing and metaprogramming are key for keeping the API simple. 4) Dynamic code loading. Users will be able to 'branch' their own version of the world and share it with others. There may be thousands of versions of a class, and they need to be able to execute in separate sandboxes at the same time. Source code will be pulled from a Git repository or some kind of versioning database. I'm interested in knowing which of these PyPy already does, and which of them I can make it do. I appreciate your help! Nathanael Jones http://nathanaeljones.com/

_______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev

William ML Leslie

8:22 a.m.

On 10 January 2011 15:24, Nathanael D. Jones <nathanael.jones@gmail.com> wrote:

...

Hi folks,

Hi!

...

2) Serializeable continuations. With gameplay being based on good plot and story flow, continuations are critical to allow straightforward implementation of 'workflows' that depend on user choice at every turn. 3) Tail-call elimination. By nature, players will accumulate a very large call stack. While this isn't terribly bad a first glance, the following issue combines with it to cause a very big problem: When code changes underneath a continuation, we need to determine how to resume flow. One option is annotating a checkpoint method in each code 'file'. However, if a user's call stack includes every file in the system, each change will cause them to restart. Tail-call elimination would help eliminate unneeded stack frames and minimize re-spawning.

I wonder if you don't want instead to manage the (user) dynamic context yourself - if players already need to remember to write all actions in a tail-recursive manner, it is probably worth making that API explicit, and if you do that, the underlying implementation doesn't need to do TCE at all. But I guess the existing stackless support should be good enough if you just want one-shot continuations.

...

4) Dynamic code loading. Users will be able to 'branch' their own version of the world and share it with others. There may be thousands of versions of a class, and they need to be able to execute in separate sandboxes at the same time. Source code will be pulled from a Git repository or some kind of versioning database.

Quite like this idea. You do have to deal with a bunch of (fairly well known) problems, which any specific implementation of dynamic code loading is going to need to solve (or not). Pypy doesn't currently implement any hot-schema-change magic, and reloading has always been error prone in the presence of state. First-class mutable types make it particularly difficult (there is no sensible answer to what it means to reload a python class). The one issue that interests me is where you implement the persistence boundary - do you go with orthogonal persistence and act as if everything is preserved, or assume all user code is run within some sort of (fast and loose) transaction that can be re-entered at will, providing an API for persistent data access? The second case makes the reloading question a bit more reasonable, because you can always throw away the current delta and replay the external effects, assuming the interface for the external events hasn't changed significantly. -- William Leslie

Paolo Giarrusso

8:18 p.m.

Hi all, On Mon, Jan 10, 2011 at 09:22, William ML Leslie <william.leslie.ttg@gmail.com> wrote:

...

On 10 January 2011 15:24, Nathanael D. Jones <nathanael.jones@gmail.com> wrote:

...
Hi folks,

...

...
2) Serializeable continuations. With gameplay being based on good plot and story flow, continuations are critical to allow straightforward implementation of 'workflows' that depend on user choice at every turn. 3) Tail-call elimination. By nature, players will accumulate a very large call stack. While this isn't terribly bad a first glance, the following issue combines with it to cause a very big problem: When code changes underneath a continuation, we need to determine how to resume flow. One option is annotating a checkpoint method in each code 'file'. [I didn't get this part. Reading what's below, I assume that for each call frame, you remember somehow the file defining the called function.] However, if a user's call stack includes every file in the system, each change will cause them to restart. Tail-call elimination would help eliminate unneeded stack frames and minimize re-spawning.

That optimization looks maybe invalid. Suppose file A contains a function f() which tail-calls g(1, 2, 3), and then file A is modified, so that f() does another tail-call. Now it's not clear why do you do restart: if you do restart when executed code is modified, then this optimization is invalid. If you reload only when code yet to execute is modified, then the optimization is valid, but you could perform also more advanced optimizations to avoid restart (you could compare the generated bytecode up to the end of the outermost loop containing the point where the call frame was saved and another procedure was invoked). It is also not clear which semantic guarantee you want to achieve by this restart. Would you use transactions to avoid performing side-effects again?

...

...
4) Dynamic code loading. Users will be able to 'branch' their own version of the world and share it with others. There may be thousands of versions of a class, and they need to be able to execute in separate sandboxes at the same time. Source code will be pulled from a Git repository or some kind of versioning database.

...

Quite like this idea.

...

You do have to deal with a bunch of (fairly well known) problems, which any specific implementation of dynamic code loading is going to need to solve (or not). Pypy doesn't currently implement any hot-schema-change magic, and reloading has always been error prone in the presence of state. First-class mutable types make it particularly difficult (there is no sensible answer to what it means to reload a python class).

You might want to reuse the solutions to those issues used in the Java (and maybe .NET) world. Java allows reloading a class in a different classloader, and that has been used inside OSGi (see http://www.osgi.org/About/Technology). Not sure about the solution in OSGi, but Java serialization allows to serialize an instance of version 1 of a class, and to de-serialize it with version 2 of that class, if v2 takes extra care for that; one could use this to convert existing instances.

...

The one issue that interests me is where you implement the persistence boundary - do you go with orthogonal persistence and act as if everything is preserved, or assume all user code is run within some sort of (fast and loose) transaction that can be re-entered at will, providing an API for persistent data access? The second case makes the reloading question a bit more reasonable, because you can always throw away the current delta and replay the external effects, assuming the interface for the external events hasn't changed significantly.

The key question is: when would you start and commit such transactions? Apart from that, your idea looks very similar to Software Transactional Memory (STM). STM restarts explicitly-marked transactions in a thread when some other thread modifies the affected data (which would be a data race) and commits its transaction. In your case, a transaction is restarted when some other thread modifies the involved code. Cheers, -- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/

William ML Leslie

10:05 p.m.

On 11 January 2011 07:18, Paolo Giarrusso <p.giarrusso@gmail.com> wrote:

...

Hi all,

On Mon, Jan 10, 2011 at 09:22, William ML Leslie <william.leslie.ttg@gmail.com> wrote:

...
On 10 January 2011 15:24, Nathanael D. Jones <nathanael.jones@gmail.com> wrote:

...
4) Dynamic code loading. Users will be able to 'branch' their own version of the world and share it with others. There may be thousands of versions of a class, and they need to be able to execute in separate sandboxes at the same time. Source code will be pulled from a Git repository or some kind of versioning database.

...
Quite like this idea.

...
You do have to deal with a bunch of (fairly well known) problems, which any specific implementation of dynamic code loading is going to need to solve (or not). Pypy doesn't currently implement any hot-schema-change magic, and reloading has always been error prone in the presence of state. First-class mutable types make it particularly difficult (there is no sensible answer to what it means to reload a python class).

You might want to reuse the solutions to those issues used in the Java (and maybe .NET) world. Java allows reloading a class in a different classloader, and that has been used inside OSGi (see http://www.osgi.org/About/Technology). Not sure about the solution in OSGi, but Java serialization allows to serialize an instance of version 1 of a class, and to de-serialize it with version 2 of that class, if v2 takes extra care for that; one could use this to convert existing instances.

Sure, and there is also Dynamic Class Evolution for Hotspot, and many other VMs support some form of reloading and redefinition (goops is an exceptional and accessible example, see http://wingolog.org/archives/2009/11/09/class-redefinition-in-guile for a post with lots of diagrams). What I am trying to say is that the *ability* to do reloading does not mean your code will work in the presence of interface changes. You have to decide whether updating a closure in a way that captures more variables or changes the signature updates existing instances of that closure: if it does, there will be entries missing in the closure slot, if it doesn't, older closures won't be callable in the same way as the new closures. No matter what, you here need to write code to be explicitly reloadable, or provide some way to instruct the runtime what to do for each schema change.

...

...
The one issue that interests me is where you implement the persistence boundary - do you go with orthogonal persistence and act as if everything is preserved, or assume all user code is run within some sort of (fast and loose) transaction that can be re-entered at will, providing an API for persistent data access? The second case makes the reloading question a bit more reasonable, because you can always throw away the current delta and replay the external effects, assuming the interface for the external events hasn't changed significantly.

The key question is: when would you start and commit such transactions?

Apart from that, your idea looks very similar to Software Transactional Memory (STM). STM restarts explicitly-marked transactions in a thread when some other thread modifies the affected data (which would be a data race) and commits its transaction. In your case, a transaction is restarted when some other thread modifies the involved code.

There is a bit of ambiguity in the "some other thread modifies" here. I don't know what synchronisation and communication is going on in your game, but I suspect that it only rarely interacts with reloading code in an interesting way. I'll reply to this properly in another email, I'd better get back to work :) -- William Leslie

Nathanael D. Jones

11 Jan 11 Jan

4:25 a.m.

Regardng #1: Sandboxing is a major concern for me. Different code will need different sandboxing levels depending upon who created/approved the code. I can't have everything in one sandbox - I need isolated boxes on a per-request level. I think I see a way to sidestep the need for #3 and also help with hot-swapping. If I try to empty the stack as often as possible, it should provide frequent opportunities for 'free' reloading. I.E. (psuedocode) def garden(): if btnHouse.clicked: return house def house(): if btnGarden.clicked: return garden def loop(): func = house; while(true): #TODO: Update func to the latest version #TODO: Build an execution sandbox appropriate to the security clearance func has func = sandbox.call(func) print "Your are now in the " + func The idea is that if functions return the next function to call instead of calling them, we have explicit tail-call elimination, and we have an explicit point at which we can rebuild the sandbox and upgrade the code. Regarding the persistence boundary, I've seen some very good points come up. A certain amount of orthogonal persistence is needed in the form of a continuation, but that would only be function-scoped data. I don't intent to allow users to use global variables or suchlike to persist data, I want a tailored API. Much data will be user scoped, and therefore lockable. However, some data will be shared across all users. I'm not sure what the best way to handle this is. With sandboxing and full control over the persistence API I could theoretically implement STM. I want to make sure the architecture is very scalable, so I've been considering something like BigTable or SimpleDB as the persistence store. Here transaction and locking options are more limited. Thoughts? Nathanael On Mon, Jan 10, 2011 at 5:05 PM, William ML Leslie < william.leslie.ttg@gmail.com> wrote:

...

On 11 January 2011 07:18, Paolo Giarrusso <p.giarrusso@gmail.com> wrote:

...
Hi all,

On Mon, Jan 10, 2011 at 09:22, William ML Leslie <william.leslie.ttg@gmail.com> wrote:

...
On 10 January 2011 15:24, Nathanael D. Jones <nathanael.jones@gmail.com> wrote:

...
4) Dynamic code loading. Users will be able to 'branch' their own version of the world and share it with others. There may be thousands of versions of a class, and they need to be able to execute in separate sandboxes at the same time. Source code will be pulled from a Git repository or some kind of versioning database.

...
Quite like this idea.

...
You do have to deal with a bunch of (fairly well known) problems, which any specific implementation of dynamic code loading is going to need to solve (or not). Pypy doesn't currently implement any hot-schema-change magic, and reloading has always been error prone in the presence of state. First-class mutable types make it particularly difficult (there is no sensible answer to what it means to reload a python class).

You might want to reuse the solutions to those issues used in the Java (and maybe .NET) world. Java allows reloading a class in a different classloader, and that has been used inside OSGi (see http://www.osgi.org/About/Technology). Not sure about the solution in OSGi, but Java serialization allows to serialize an instance of version 1 of a class, and to de-serialize it with version 2 of that class, if v2 takes extra care for that; one could use this to convert existing instances.

Sure, and there is also Dynamic Class Evolution for Hotspot, and many other VMs support some form of reloading and redefinition (goops is an exceptional and accessible example, see http://wingolog.org/archives/2009/11/09/class-redefinition-in-guile for a post with lots of diagrams).

What I am trying to say is that the *ability* to do reloading does not mean your code will work in the presence of interface changes. You have to decide whether updating a closure in a way that captures more variables or changes the signature updates existing instances of that closure: if it does, there will be entries missing in the closure slot, if it doesn't, older closures won't be callable in the same way as the new closures. No matter what, you here need to write code to be explicitly reloadable, or provide some way to instruct the runtime what to do for each schema change.

...
...
The one issue that interests me is where you implement the persistence boundary - do you go with orthogonal persistence and act as if everything is preserved, or assume all user code is run within some sort of (fast and loose) transaction that can be re-entered at will, providing an API for persistent data access? The second case makes the reloading question a bit more reasonable, because you can always throw away the current delta and replay the external effects, assuming the interface for the external events hasn't changed significantly.

The key question is: when would you start and commit such transactions?

Apart from that, your idea looks very similar to Software Transactional Memory (STM). STM restarts explicitly-marked transactions in a thread when some other thread modifies the affected data (which would be a data race) and commits its transaction. In your case, a transaction is restarted when some other thread modifies the involved code.

There is a bit of ambiguity in the "some other thread modifies" here. I don't know what synchronisation and communication is going on in your game, but I suspect that it only rarely interacts with reloading code in an interesting way. I'll reply to this properly in another email, I'd better get back to work :)

-- William Leslie

William ML Leslie

7:43 a.m.

On 11 January 2011 15:25, Nathanael D. Jones <nathanael.jones@gmail.com> wrote:

...

Regardng #1: Sandboxing is a major concern for me. Different code will need different sandboxing levels depending upon who created/approved the code. I can't have everything in one sandbox - I need isolated boxes on a per-request level.

You will probably want to implement a capability architecture, where the authority to perform some action is bundled together with the method that invokes it. The language E ( http://erights.org ) was designed for this sort of thing (I think it was written for a MUD originally), and would be worth looking at even if you didn't use the language itself. Capability theory is sadly not well known, considering just how well it solves ever more common problems.

...

The idea is that if functions return the next function to call instead of calling them, we have explicit tail-call elimination, and we have an explicit point at which we can rebuild the sandbox and upgrade the code.

Right, this is the "driver loop" implementation of TCE. There are several implementations of TCE kicking around, (including one I did for cpython some time ago, a decorator that would patch load ... call* ... return bytecodes with a call to a class that collects the function and arguments, and wraps the function in a driver loop that is bypassed when the target is marked as recursive, too). Unfortunately, using only implicit continuations, once you enter somebody's function, you have no way to get out of it - that person can except: and trap you there if they like. E has a way to protect against this, but you can't protect against the function busy-looping without something more (a way to kill threads, specifically, which probably means taking down the process).

...

Much data will be user scoped, and therefore lockable.

What does lockable mean in this case? Under what conditions does reloading happen while a user continuation is in play? How do you want continuations to interact with the locking?

...

On Mon, Jan 10, 2011 at 5:05 PM, William ML Leslie <william.leslie.ttg@gmail.com> wrote:

...
On 11 January 2011 07:18, Paolo Giarrusso <p.giarrusso@gmail.com> wrote:

...
...
The one issue that interests me is where you implement the persistence boundary - do you go with orthogonal persistence and act as if everything is preserved, or assume all user code is run within some sort of (fast and loose) transaction that can be re-entered at will, providing an API for persistent data access? The second case makes the reloading question a bit more reasonable, because you can always throw away the current delta and replay the external effects, assuming the interface for the external events hasn't changed significantly.

The key question is: when would you start and commit such transactions?

Apart from that, your idea looks very similar to Software Transactional Memory (STM). STM restarts explicitly-marked transactions in a thread when some other thread modifies the affected data (which would be a data race) and commits its transaction. In your case, a transaction is restarted when some other thread modifies the involved code.

I don't think it would be useful at all to have regular STM transactions delimited at the room boundary - there are surely going to be other people interacting with the room at the same time as you, and sending everyone else back to the start of the room every time someone modifies the room is going to get old very quickly. The boundary introduced by modifying code is probably significantly more coarse. If you are storing (as the transaction log) only the external effects (user input, random.* output), however, you can build effect commutators automatically. For example, given a code change to object X, you can initialise it with the previous persistent state, and replay the effects within the current transaction. That said, I think having a transaction per room is too coarse - per input or set of inputs makes even more sense. The idea there is that there is much less code to replay when two effects interfere. -- William Leslie

Maciej Fijalkowski

7:48 a.m.

Hello. This discussion have drifted considerably. In general this is the pypy development mailing list, so personally I would like to keep discussions only to things that are either very interesting for people actively working on PyPy or have remote possibility that someone will start working on them. Discussing STM without someone being interested in actually looking is out of topic. PS. Feel free to correct me if anyone working actively on pypy is finding this discussion useful for PyPy development. Cheers, fijal On Tue, Jan 11, 2011 at 9:43 AM, William ML Leslie <william.leslie.ttg@gmail.com> wrote:

...

On 11 January 2011 15:25, Nathanael D. Jones <nathanael.jones@gmail.com> wrote:

...
Regardng #1: Sandboxing is a major concern for me. Different code will need different sandboxing levels depending upon who created/approved the code. I can't have everything in one sandbox - I need isolated boxes on a per-request level.

You will probably want to implement a capability architecture, where the authority to perform some action is bundled together with the method that invokes it. The language E ( http://erights.org ) was designed for this sort of thing (I think it was written for a MUD originally), and would be worth looking at even if you didn't use the language itself. Capability theory is sadly not well known, considering just how well it solves ever more common problems.

...
The idea is that if functions return the next function to call instead of calling them, we have explicit tail-call elimination, and we have an explicit point at which we can rebuild the sandbox and upgrade the code.

Right, this is the "driver loop" implementation of TCE. There are several implementations of TCE kicking around, (including one I did for cpython some time ago, a decorator that would patch load ... call* ... return bytecodes with a call to a class that collects the function and arguments, and wraps the function in a driver loop that is bypassed when the target is marked as recursive, too).

Unfortunately, using only implicit continuations, once you enter somebody's function, you have no way to get out of it - that person can except: and trap you there if they like. E has a way to protect against this, but you can't protect against the function busy-looping without something more (a way to kill threads, specifically, which probably means taking down the process).

...
Much data will be user scoped, and therefore lockable.

What does lockable mean in this case?

Under what conditions does reloading happen while a user continuation is in play? How do you want continuations to interact with the locking?

...
On Mon, Jan 10, 2011 at 5:05 PM, William ML Leslie <william.leslie.ttg@gmail.com> wrote:

...
On 11 January 2011 07:18, Paolo Giarrusso <p.giarrusso@gmail.com> wrote:

...
...
The one issue that interests me is where you implement the persistence boundary - do you go with orthogonal persistence and act as if everything is preserved, or assume all user code is run within some sort of (fast and loose) transaction that can be re-entered at will, providing an API for persistent data access? The second case makes the reloading question a bit more reasonable, because you can always throw away the current delta and replay the external effects, assuming the interface for the external events hasn't changed significantly.

The key question is: when would you start and commit such transactions?

Apart from that, your idea looks very similar to Software Transactional Memory (STM). STM restarts explicitly-marked transactions in a thread when some other thread modifies the affected data (which would be a data race) and commits its transaction. In your case, a transaction is restarted when some other thread modifies the involved code.

I don't think it would be useful at all to have regular STM transactions delimited at the room boundary - there are surely going to be other people interacting with the room at the same time as you, and sending everyone else back to the start of the room every time someone modifies the room is going to get old very quickly. The boundary introduced by modifying code is probably significantly more coarse.

If you are storing (as the transaction log) only the external effects (user input, random.* output), however, you can build effect commutators automatically. For example, given a code change to object X, you can initialise it with the previous persistent state, and replay the effects within the current transaction.

That said, I think having a transaction per room is too coarse - per input or set of inputs makes even more sense. The idea there is that there is much less code to replay when two effects interfere.

-- William Leslie _______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev

holger krekel

12:58 p.m.

Hi Maciej, it's not clear to me what posts your are refering to. Certainly not the original post from Nathan? If you think it's a general issue then do a top-level posting and please suggest how "non-active" (defined how?) pypy developers are supposed to know if some topic is interesting without posting about it. And include some concrete rules about when side discussions are ok and when not. Probably all harder to do than improving your ignoring techniques? cheers, holger On Tue, Jan 11, 2011 at 09:48 +0200, Maciej Fijalkowski wrote:

...

Hello.

This discussion have drifted considerably. In general this is the pypy development mailing list, so personally I would like to keep discussions only to things that are either very interesting for people actively working on PyPy or have remote possibility that someone will start working on them. Discussing STM without someone being interested in actually looking is out of topic.

PS. Feel free to correct me if anyone working actively on pypy is finding this discussion useful for PyPy development.

Cheers, fijal

On Tue, Jan 11, 2011 at 9:43 AM, William ML Leslie <william.leslie.ttg@gmail.com> wrote:

...
On 11 January 2011 15:25, Nathanael D. Jones <nathanael.jones@gmail.com> wrote:

...
Regardng #1: Sandboxing is a major concern for me. Different code will need different sandboxing levels depending upon who created/approved the code. I can't have everything in one sandbox - I need isolated boxes on a per-request level.

You will probably want to implement a capability architecture, where the authority to perform some action is bundled together with the method that invokes it. The language E ( http://erights.org ) was designed for this sort of thing (I think it was written for a MUD originally), and would be worth looking at even if you didn't use the language itself. Capability theory is sadly not well known, considering just how well it solves ever more common problems.

...
The idea is that if functions return the next function to call instead of calling them, we have explicit tail-call elimination, and we have an explicit point at which we can rebuild the sandbox and upgrade the code.

Right, this is the "driver loop" implementation of TCE. There are several implementations of TCE kicking around, (including one I did for cpython some time ago, a decorator that would patch load ... call* ... return bytecodes with a call to a class that collects the function and arguments, and wraps the function in a driver loop that is bypassed when the target is marked as recursive, too).

Unfortunately, using only implicit continuations, once you enter somebody's function, you have no way to get out of it - that person can except: and trap you there if they like. E has a way to protect against this, but you can't protect against the function busy-looping without something more (a way to kill threads, specifically, which probably means taking down the process).

...
Much data will be user scoped, and therefore lockable.

What does lockable mean in this case?

Under what conditions does reloading happen while a user continuation is in play? How do you want continuations to interact with the locking?

...
On Mon, Jan 10, 2011 at 5:05 PM, William ML Leslie <william.leslie.ttg@gmail.com> wrote:

...
On 11 January 2011 07:18, Paolo Giarrusso <p.giarrusso@gmail.com> wrote:

...
...
The one issue that interests me is where you implement the persistence boundary - do you go with orthogonal persistence and act as if everything is preserved, or assume all user code is run within some sort of (fast and loose) transaction that can be re-entered at will, providing an API for persistent data access? The second case makes the reloading question a bit more reasonable, because you can always throw away the current delta and replay the external effects, assuming the interface for the external events hasn't changed significantly.

The key question is: when would you start and commit such transactions?

Apart from that, your idea looks very similar to Software Transactional Memory (STM). STM restarts explicitly-marked transactions in a thread when some other thread modifies the affected data (which would be a data race) and commits its transaction. In your case, a transaction is restarted when some other thread modifies the involved code.

I don't think it would be useful at all to have regular STM transactions delimited at the room boundary - there are surely going to be other people interacting with the room at the same time as you, and sending everyone else back to the start of the room every time someone modifies the room is going to get old very quickly. The boundary introduced by modifying code is probably significantly more coarse.

If you are storing (as the transaction log) only the external effects (user input, random.* output), however, you can build effect commutators automatically. For example, given a code change to object X, you can initialise it with the previous persistent state, and replay the effects within the current transaction.

That said, I think having a transaction per room is too coarse - per input or set of inputs makes even more sense. The idea there is that there is much less code to replay when two effects interfere.

-- William Leslie _______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev

_______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev --

Maciej Fijalkowski

1:10 p.m.

On Tue, Jan 11, 2011 at 2:58 PM, holger krekel <holger@merlinux.eu> wrote:

...

Hi Maciej,

it's not clear to me what posts your are refering to. Certainly not the original post from Nathan?

If you think it's a general issue then do a top-level posting and please suggest how "non-active" (defined how?) pypy developers are supposed to know if some topic is interesting without posting about it. And include some concrete rules about when side discussions are ok and when not. Probably all harder to do than improving your ignoring techniques?

Well, I can easily ignore things that are not interesting to me. My point was that discussions about topics (like capabilities) only make sense if there is someone even remotely interested in implementing that in PyPy. Otherwise discussions tend to drift into "but that's also a cool idea", which is off-topic for PyPy, since noone actually wants to implement any of those. For example I would say all GIL-removal discussions are offtopic unless someone has a will to experiment with implementing it. That's, as I said above, my personal opinion, backed by a fact that this is the list for discussing developing PyPy (so some people, like me, feel obliged to read whole discussions here). Feel free to propose some other guidelines for what is on-topic and what is off-topic for that list. As active I mean anyone doing any work on PyPy. Be it a typo in documentation :) Cheers, fijal

...

cheers, holger

On Tue, Jan 11, 2011 at 09:48 +0200, Maciej Fijalkowski wrote:

...
Hello.

This discussion have drifted considerably. In general this is the pypy development mailing list, so personally I would like to keep discussions only to things that are either very interesting for people actively working on PyPy or have remote possibility that someone will start working on them. Discussing STM without someone being interested in actually looking is out of topic.

PS. Feel free to correct me if anyone working actively on pypy is finding this discussion useful for PyPy development.

Cheers, fijal

On Tue, Jan 11, 2011 at 9:43 AM, William ML Leslie <william.leslie.ttg@gmail.com> wrote:

...
On 11 January 2011 15:25, Nathanael D. Jones <nathanael.jones@gmail.com> wrote:

...
Regardng #1: Sandboxing is a major concern for me. Different code will need different sandboxing levels depending upon who created/approved the code. I can't have everything in one sandbox - I need isolated boxes on a per-request level.

You will probably want to implement a capability architecture, where the authority to perform some action is bundled together with the method that invokes it. The language E ( http://erights.org ) was designed for this sort of thing (I think it was written for a MUD originally), and would be worth looking at even if you didn't use the language itself. Capability theory is sadly not well known, considering just how well it solves ever more common problems.

...
The idea is that if functions return the next function to call instead of calling them, we have explicit tail-call elimination, and we have an explicit point at which we can rebuild the sandbox and upgrade the code.

Right, this is the "driver loop" implementation of TCE. There are several implementations of TCE kicking around, (including one I did for cpython some time ago, a decorator that would patch load ... call* ... return bytecodes with a call to a class that collects the function and arguments, and wraps the function in a driver loop that is bypassed when the target is marked as recursive, too).

Unfortunately, using only implicit continuations, once you enter somebody's function, you have no way to get out of it - that person can except: and trap you there if they like. E has a way to protect against this, but you can't protect against the function busy-looping without something more (a way to kill threads, specifically, which probably means taking down the process).

...
Much data will be user scoped, and therefore lockable.

What does lockable mean in this case?

Under what conditions does reloading happen while a user continuation is in play? How do you want continuations to interact with the locking?

...
On Mon, Jan 10, 2011 at 5:05 PM, William ML Leslie <william.leslie.ttg@gmail.com> wrote:

...
On 11 January 2011 07:18, Paolo Giarrusso <p.giarrusso@gmail.com> wrote:

...
> The one issue that interests me is where you implement the persistence > boundary - do you go with orthogonal persistence and act as if > everything is preserved, or assume all user code is run within some > sort of (fast and loose) transaction that can be re-entered at will, > providing an API for persistent data access? The second case makes > the reloading question a bit more reasonable, because you can always > throw away the current delta and replay the external effects, assuming > the interface for the external events hasn't changed significantly.

The key question is: when would you start and commit such transactions?

Apart from that, your idea looks very similar to Software Transactional Memory (STM). STM restarts explicitly-marked transactions in a thread when some other thread modifies the affected data (which would be a data race) and commits its transaction. In your case, a transaction is restarted when some other thread modifies the involved code.

I don't think it would be useful at all to have regular STM transactions delimited at the room boundary - there are surely going to be other people interacting with the room at the same time as you, and sending everyone else back to the start of the room every time someone modifies the room is going to get old very quickly. The boundary introduced by modifying code is probably significantly more coarse.

If you are storing (as the transaction log) only the external effects (user input, random.* output), however, you can build effect commutators automatically. For example, given a code change to object X, you can initialise it with the previous persistent state, and replay the effects within the current transaction.

That said, I think having a transaction per room is too coarse - per input or set of inputs makes even more sense. The idea there is that there is much less code to replay when two effects interfere.

-- William Leslie _______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev

_______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev --

Leonardo Santagada

3:25 p.m.

On Tue, Jan 11, 2011 at 11:10 AM, Maciej Fijalkowski <fijall@gmail.com> wrote:

...

On Tue, Jan 11, 2011 at 2:58 PM, holger krekel <holger@merlinux.eu> wrote:

...
Hi Maciej,

it's not clear to me what posts your are refering to. Certainly not the original post from Nathan?

If you think it's a general issue then do a top-level posting and please suggest how "non-active" (defined how?) pypy developers are supposed to know if some topic is interesting without posting about it. And include some concrete rules about when side discussions are ok and when not. Probably all harder to do than improving your ignoring techniques?

Well, I can easily ignore things that are not interesting to me.

My point was that discussions about topics (like capabilities) only make sense if there is someone even remotely interested in implementing that in PyPy. Otherwise discussions tend to drift into "but that's also a cool idea", which is off-topic for PyPy, since noone actually wants to implement any of those.

Well we could tell him how to implement capabilities in pypy and how easy/hard this seems. Without looking much at it I think you could implement it by creating a new object space, like those: http://codespeak.net/pypy/dist/pypy/doc/objspace-proxies.html

...

For example I would say all GIL-removal discussions are offtopic unless someone has a will to experiment with implementing it.

This is also something I see as a problem in communication, there should be at least a FAQ question about this or even a good description of how pypy does Threads and why it does the same thing as CPython.

...

That's, as I said above, my personal opinion, backed by a fact that this is the list for discussing developing PyPy (so some people, like me, feel obliged to read whole discussions here). Feel free to propose some other guidelines for what is on-topic and what is off-topic for that list.

Maybe pypy developers could steers discussions into being on topic.

...

As active I mean anyone doing any work on PyPy. Be it a typo in documentation :)

People feel that they are helping by discussing on the list ideas and approaches that maybe the pypy team might not know.

...

Cheers, fijal

-- Leonardo Santagada

5118

Age (days ago)

5119

Last active (days ago)

List overview

Download

10 comments

7 participants

participants (7)

Dima Tisnek
holger krekel
Leonardo Santagada
Maciej Fijalkowski
Nathanael D. Jones
Paolo Giarrusso
William ML Leslie

Continuations and sandboxing

Nathanael D. Jones

Paolo Giarrusso

Nathanael D. Jones

tags

participants (7)