[pypy-dev] Continuations and sandboxing

Tue Jan 11 05:25:04 CET 2011

Regardng #1: Sandboxing is a major concern for me. Different code will need
different sandboxing levels depending upon who created/approved the code.
I can't have everything in one sandbox - I need isolated boxes on a
per-request level.

I think I see a way to sidestep the need for #3 and also help with
hot-swapping.

If I try to empty the stack as often as possible, it should provide frequent
opportunities for 'free' reloading.

I.E. (psuedocode)

def garden():
  if  btnHouse.clicked: return house

def house():
  if btnGarden.clicked: return garden

def loop():
  func = house;
  while(true):
      #TODO: Update func to the latest version
      #TODO: Build an execution sandbox appropriate to the security
clearance func has
      func = sandbox.call(func)
      print "Your are now in the " + func

The idea is that if functions return the next function to call instead of
calling them, we have explicit tail-call elimination, and we have an
explicit point at which we can rebuild the sandbox and upgrade the code.

Regarding the persistence boundary, I've seen some very good points come up.

A certain amount of orthogonal persistence is needed in the form of a
continuation, but that would only be function-scoped data. I don't intent to
allow users to use global variables or suchlike to persist data, I want a
tailored API.

Much data will be user scoped, and therefore lockable.

However, some data will be shared across all users.  I'm not sure what the
best way to handle this is. With sandboxing and full control over the
persistence API I could theoretically implement STM.

I want to make sure the architecture is very scalable, so I've been
considering something like BigTable or SimpleDB as the persistence store.
Here transaction and locking options are more limited.

Thoughts?

Nathanael

On Mon, Jan 10, 2011 at 5:05 PM, William ML Leslie <
william.leslie.ttg at gmail.com> wrote:

> On 11 January 2011 07:18, Paolo Giarrusso <p.giarrusso at gmail.com> wrote:
> > Hi all,
> >
> > On Mon, Jan 10, 2011 at 09:22, William ML Leslie
> > <william.leslie.ttg at gmail.com> wrote:
> >> On 10 January 2011 15:24, Nathanael D. Jones <nathanael.jones at gmail.com>
> wrote:
> >>> 4) Dynamic code loading. Users will be able to 'branch' their own
> version of
> >>> the world and share it with others. There may be thousands of versions
> of a
> >>> class, and they need to be able to execute in separate sandboxes at the
> same
> >>> time. Source code will be pulled from a Git repository or some kind of
> >>> versioning database.
> >
> >> Quite like this idea.
> >
> >> You do have to deal with a bunch of (fairly well known) problems,
> >> which any specific implementation of dynamic code loading is going to
> >> need to solve (or not).  Pypy doesn't currently implement any
> >> hot-schema-change magic, and reloading has always been error prone in
> >> the presence of state.  First-class mutable types make it particularly
> >> difficult (there is no sensible answer to what it means to reload a
> >> python class).
> >
> > You might want to reuse the solutions to those issues used in the Java
> > (and maybe .NET) world. Java allows reloading a class in a different
> > classloader, and that has been used inside OSGi (see
> > http://www.osgi.org/About/Technology).
> > Not sure about the solution in OSGi, but Java serialization allows to
> > serialize an instance of version 1 of a class, and to de-serialize it
> > with version 2 of that class, if v2 takes extra care for that; one
> > could use this to convert existing instances.
>
> Sure, and there is also Dynamic Class Evolution for Hotspot, and many
> other VMs support some form of reloading and redefinition (goops is an
> exceptional and accessible example, see
> http://wingolog.org/archives/2009/11/09/class-redefinition-in-guile
> for a post with lots of diagrams).
>
> What I am trying to say is that the *ability* to do reloading does not
> mean your code will work in the presence of interface changes.  You
> have to decide whether updating a closure in a way that captures more
> variables or changes the signature updates existing instances of that
> closure: if it does, there will be entries missing in the closure
> slot, if it doesn't, older closures won't be callable in the same way
> as the new closures.  No matter what, you here need to write code to
> be explicitly reloadable, or provide some way to instruct the runtime
> what to do for each schema change.
>
> >
> >> The one issue that interests me is where you implement the persistence
> >> boundary - do you go with orthogonal persistence and act as if
> >> everything is preserved, or assume all user code is run within some
> >> sort of (fast and loose) transaction that can be re-entered at will,
> >> providing an API for persistent data access?  The second case makes
> >> the reloading question a bit more reasonable, because you can always
> >> throw away the current delta and replay the external effects, assuming
> >> the interface for the external events hasn't changed significantly.
> >
> > The key question is: when would you start and commit such transactions?
> >
> > Apart from that, your idea looks very similar to Software
> > Transactional Memory (STM). STM restarts explicitly-marked
> > transactions in a thread when some other thread modifies the affected
> > data (which would be a data race) and commits its transaction. In your
> > case, a transaction is restarted when some other thread modifies the
> > involved code.
>
> There is a bit of ambiguity in the "some other thread modifies" here.
> I don't know what synchronisation and communication is going on in
> your game, but I suspect that it only rarely interacts with reloading
> code in an interesting way.  I'll reply to this properly in another
> email, I'd better get back to work :)
>
> --
> William Leslie
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20110110/2eac54b5/attachment.html>