[pypy-dev] Sandboxing questions

Sat Jul 16 22:37:46 CEST 2011

On Jul 16, 2011 5:13 AM, "Armin Rigo" <arigo at tunes.org> wrote:
>
> Hi,
>
> On Sat, Jul 16, 2011 at 12:32 AM, VanL <van.lindberg at gmail.com> wrote:
> > I think that a better (read: closer term, and more likely to be
performant)
> > answer is to create multiple interpreters, *each with their own GIL,
each in
> > their own thread,* and connect them via channels (essentially a pair of
> > queues).
>
> That's hand-waving away the real question: what can you pass over
> channels?  If the interpreters are supposed to be completely
> separated, then you can only pass strings, and the result looks
> exactly like separated processes.  You can extend it to pass tuples
> and other simple data structures, but that's the same as extending the
> cross-process communication protocol.  If on the other hand you can
> pass arbitrary random objects, then you have the issue that the
> objects are not really owned by one interpreter or the other; I don't
> really think it can be made to work in the current model of the object
> space reference.  Even if we manage, we'd end up again with the issue
> of concurrent changes to shared objects, which is the core problem to
> solve in any case --- either in your approach or with STM or with
> fine-grained locking.

My intention was to proceed in four steps:

First, allow the passing of any immutable type. This is about the same as
multiprocessing, but you could do it without incurring the
serialization/deserialization overhead.

Second, allow the passing of mutable types with copy-on-write semantics. Not
that this wiukd all be a sync through queues.

Third, allow memory views or classes in a sending object space/thread to
expose read-only access to to another objectspace /thread. The shared
objects wiukd need to be explicitly declared, probably using something
similar to the POSH semantics.

Fourth, allow read-write access of items that were explicitly declared to be
shared.  One object space would be the owner of any particular object; if
another object space wanted to access and modify that object, it would need
to acquire the GIL for the owning object space to do so. Your STM work could
eventually make acquiring the GIL for the owning object space
unnecessary-but in the nearer term, I think that the semantics above would
work.

For example, assume objectspaces A,B, and C, each in their own thread, each
with their own GIL. From the perspective of space A, B and C both look like
opaque extensions. When space B wants to access something in space A, it
needs to acquire GIL A. The existing GIL semantics mediate accesses to the
state of space A.

Part of what is interesting is that the spaces are completely independent,
so you can open a socket in space A that reads and writes strings to that
socket. The socket only exists in space A, so other spaces either don't see
it (if it is not declared shared) or they have to acquire the GIL for space
A to read or write to it.

Similarly, space B can load up some modules or extensions that only exist in
space B. So perhaps space A handles I/O through the socket it owns, and then
sends requests/responses through the channels to spaces B...N for
processing. Lets say that some of these are processor intensive; it doesn't
matter. There is no shared state between the spaces/threads unless explicit
synchronization is required and asked for by the programmer. You can peg one
thread/space without affecting the others.

Thanks,

Van
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20110716/3427f679/attachment.html>