I have a couple questions about the sandboxing feature: - Currently this is a two-process model, but early on the assertion was made that this could be done in a single process, perhaps but not necessarily separated by two OS-level threads. Is this (still?) true? What would you need to invoke to create such a pypy? - How granular can the control on imported/run functions be? Can you have a full interpreter that does everything, or an interpreter that allows socket access and that is it? Thanks, Van
On Fri, Jul 15, 2011 at 7:09 PM, VanL <van.lindberg@gmail.com> wrote:
I have a couple questions about the sandboxing feature:
- Currently this is a two-process model, but early on the assertion was made that this could be done in a single process, perhaps but not necessarily separated by two OS-level threads. Is this (still?) true? What would you need to invoke to create such a pypy?
By design, a single process thing is slightly less secure. If you say find a way to corrupt random memory, you can modify the other process, it's still only very slightly though. The sandboxing approach should work quite nicely, the hard part would be to get multiple interpreters running in a single process. It's quite a bit of work, but I would not expect it to be overly hard to do. Requires quite a bit of pypy knowledge though.
- How granular can the control on imported/run functions be? Can you have a full interpreter that does everything, or an interpreter that allows socket access and that is it?
It's very granular. Besides memory and CPU limits, you also control every single call that would normally be a C call, like read, write or stat, but you can implement an arbitrary custom behavior for those functions.
Thanks,
Van
_______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
On 7/15/2011 1:50 PM, Maciej Fijalkowski wrote:
By design, a single process thing is slightly less secure. If you say find a way to corrupt random memory, you can modify the other process, it's still only very slightly though. The sandboxing approach should work quite nicely, the hard part would be to get multiple interpreters running in a single process. It's quite a bit of work, but I would not expect it to be overly hard to do. Requires quite a bit of pypy knowledge though.
Could you describe a little bit more about "quite a bit of work, but... [not] overly hard to do"? What would it take, and where would someone get started? Thanks, Van
On Fri, Jul 15, 2011 at 9:05 PM, VanL <van.lindberg@gmail.com> wrote:
On 7/15/2011 1:50 PM, Maciej Fijalkowski wrote:
By design, a single process thing is slightly less secure. If you say find a way to corrupt random memory, you can modify the other process, it's still only very slightly though. The sandboxing approach should work quite nicely, the hard part would be to get multiple interpreters running in a single process. It's quite a bit of work, but I would not expect it to be overly hard to do. Requires quite a bit of pypy knowledge though.
Could you describe a little bit more about "quite a bit of work, but... [not] overly hard to do"? What would it take, and where would someone get started?
Heh, I was kind of hoping to avoid having to answer that :-) You essentially need two things in order to achieve it: * have two interpreters in one executable (provided sandboxes don't have to be separated from each other), one constructed with sandboxing options the other without. This is something that I would describe as "run around and make it work", but probably starting with having either two copies of functions or just two copies of object spaces. * change the sandboxing transformation to call some RPython-level API instead of read/write standard output. Also provide the other end of this API. As of now the transformation walks around all graphs and changes external calls into special calls that get rendered as standard output write and standard input read. I know, this is kind of hand-waving what has to be done, I would probably start with having two interpreters in one executable, probably by having two object spaces. Cheers, fijal
On Fri, Jul 15, 2011 at 9:31 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
On Fri, Jul 15, 2011 at 9:05 PM, VanL <van.lindberg@gmail.com> wrote:
On 7/15/2011 1:50 PM, Maciej Fijalkowski wrote:
By design, a single process thing is slightly less secure. If you say find a way to corrupt random memory, you can modify the other process, it's still only very slightly though. The sandboxing approach should work quite nicely, the hard part would be to get multiple interpreters running in a single process. It's quite a bit of work, but I would not expect it to be overly hard to do. Requires quite a bit of pypy knowledge though.
Could you describe a little bit more about "quite a bit of work, but... [not] overly hard to do"? What would it take, and where would someone get started?
Heh, I was kind of hoping to avoid having to answer that :-)
You essentially need two things in order to achieve it:
* have two interpreters in one executable (provided sandboxes don't have to be separated from each other), one constructed with sandboxing options the other without. This is something that I would describe as "run around and make it work", but probably starting with having either two copies of functions or just two copies of object spaces.
* change the sandboxing transformation to call some RPython-level API instead of read/write standard output. Also provide the other end of this API. As of now the transformation walks around all graphs and changes external calls into special calls that get rendered as standard output write and standard input read.
I know, this is kind of hand-waving what has to be done, I would probably start with having two interpreters in one executable, probably by having two object spaces.
Cheers, fijal
And if I may ask, what are you trying to achieve? Cheers, fijal
On 7/15/2011 2:31 PM, Maciej Fijalkowski wrote:
I know, this is kind of hand-waving what has to be done, I would probably start with having two interpreters in one executable, probably by having two object spaces.
Cheers, fijal
And if I may ask, what are you trying to achieve?
Two (or more) interpreters in one executable. :) I was recently pondering the recent announcement by Armin that he thinks STM is the way to kill the GIL. I don't think the problem is the GIL; I think the problem is that we have only one. I think that a better (read: closer term, and more likely to be performant) answer is to create multiple interpreters, *each with their own GIL, each in their own thread,* and connect them via channels (essentially a pair of queues). I already knew about multiple object spaces and PyPy's sandboxing; I thought this would be the easiest way to play with that idea. Note that this is not Erlang-style processes - this is closer to appdomains (from .net), although the communications is inspired by Erlang+Go.
Hi, On Sat, Jul 16, 2011 at 12:32 AM, VanL <van.lindberg@gmail.com> wrote:
I think that a better (read: closer term, and more likely to be performant) answer is to create multiple interpreters, *each with their own GIL, each in their own thread,* and connect them via channels (essentially a pair of queues).
That's hand-waving away the real question: what can you pass over channels? If the interpreters are supposed to be completely separated, then you can only pass strings, and the result looks exactly like separated processes. You can extend it to pass tuples and other simple data structures, but that's the same as extending the cross-process communication protocol. If on the other hand you can pass arbitrary random objects, then you have the issue that the objects are not really owned by one interpreter or the other; I don't really think it can be made to work in the current model of the object space reference. Even if we manage, we'd end up again with the issue of concurrent changes to shared objects, which is the core problem to solve in any case --- either in your approach or with STM or with fine-grained locking. A bientôt, Armin.
On Jul 16, 2011 5:13 AM, "Armin Rigo" <arigo@tunes.org> wrote:
Hi,
On Sat, Jul 16, 2011 at 12:32 AM, VanL <van.lindberg@gmail.com> wrote:
I think that a better (read: closer term, and more likely to be
performant)
answer is to create multiple interpreters, *each with their own GIL, each in their own thread,* and connect them via channels (essentially a pair of queues).
That's hand-waving away the real question: what can you pass over channels? If the interpreters are supposed to be completely separated, then you can only pass strings, and the result looks exactly like separated processes. You can extend it to pass tuples and other simple data structures, but that's the same as extending the cross-process communication protocol. If on the other hand you can pass arbitrary random objects, then you have the issue that the objects are not really owned by one interpreter or the other; I don't really think it can be made to work in the current model of the object space reference. Even if we manage, we'd end up again with the issue of concurrent changes to shared objects, which is the core problem to solve in any case --- either in your approach or with STM or with fine-grained locking.
My intention was to proceed in four steps: First, allow the passing of any immutable type. This is about the same as multiprocessing, but you could do it without incurring the serialization/deserialization overhead. Second, allow the passing of mutable types with copy-on-write semantics. Not that this wiukd all be a sync through queues. Third, allow memory views or classes in a sending object space/thread to expose read-only access to to another objectspace /thread. The shared objects wiukd need to be explicitly declared, probably using something similar to the POSH semantics. Fourth, allow read-write access of items that were explicitly declared to be shared. One object space would be the owner of any particular object; if another object space wanted to access and modify that object, it would need to acquire the GIL for the owning object space to do so. Your STM work could eventually make acquiring the GIL for the owning object space unnecessary-but in the nearer term, I think that the semantics above would work. For example, assume objectspaces A,B, and C, each in their own thread, each with their own GIL. From the perspective of space A, B and C both look like opaque extensions. When space B wants to access something in space A, it needs to acquire GIL A. The existing GIL semantics mediate accesses to the state of space A. Part of what is interesting is that the spaces are completely independent, so you can open a socket in space A that reads and writes strings to that socket. The socket only exists in space A, so other spaces either don't see it (if it is not declared shared) or they have to acquire the GIL for space A to read or write to it. Similarly, space B can load up some modules or extensions that only exist in space B. So perhaps space A handles I/O through the socket it owns, and then sends requests/responses through the channels to spaces B...N for processing. Lets say that some of these are processor intensive; it doesn't matter. There is no shared state between the spaces/threads unless explicit synchronization is required and asked for by the programmer. You can peg one thread/space without affecting the others. Thanks, Van
Hi VanL, On Sat, Jul 16, 2011 at 10:37 PM, VanL <van.lindberg@gmail.com> wrote:
(...) There is no shared state between the spaces/threads unless explicit synchronization is required and asked for by the programmer. You can peg one thread/space without affecting the others.
Sure, feel free to try this out. It requires careful language-level design, like a new API with which new Python programs must be written. That's why I personally prefer my approach, because it is implementation-only, without needing any changes to existing programs. A bientôt, Armin.
participants (3)
-
Armin Rigo
-
Maciej Fijalkowski
-
VanL