
I want to have a function along these lines: safeEval(code, ns={'name': obj}, operations=['import', 'def', 'for', 'while']) It'll execute 'code', with only access to the names in 'ns' dict, and only the operations listed in the 'operations' list. This function would be called from a regular CPython interpreter, and safeEval would be an interface to PyPy. My end goal is to have potentially hundreds of *totally* untrusted scripts running concurrently; they shouldn't be able to stomp on each other or on the "main" application code. CPU and memory restriction are a big issue, but I'm only concerned with namespace and operation restriction at the moment. The core of any restricted system needs to be able to start code in an absolutely empty environment, not allowing anything at all. Then I can add operations and functions necessary for my application after I do the Hard Work of auditing each one ;) This is the same reason I was recently working (well, somewhat.. ;) on PLT-Spy[1], an implementation of Python on PLT-Scheme. It compiled the Python to Scheme code which has calls to a Python runtime implemented for PLT. The Python runtime does 99% of its work by calling out to libpython, i.e., CPython. So, this let me have my cake and eat it too: I could take advantage of PLT's awesome restricted-execution support, *and* I could use CPython without reimplementing the whole runtime. But it wasn't _that_ simple: we have to modify some fundamental parts of CPython to interoperate with PLT. The garbage collection and namespace functions, in particular. I'm hoping I can entirely avoid C code by using PyPy and still be able to use the stock Python runtime. PyPy is confusing to me so far, though. I've worked with several implementations of small languages, including a tiny lisp meant for restricted execution, and even an implementation of Python (PLT-Spy). I can't seem to get my brain integrated with the PyPy codebase, yet, but I'm not sure where to start. So, given that explanation of my goal, where should I start? What might I have to change in PyPy? How can I get an analog of pypy.interpreter.main.run_string that lets me restrict what's available to the code? Thanks for any help :-) 1: http://plt-spy.sourceforge.net/ -- Twisted | Christopher Armstrong: International Man of Twistery Radix | Release Manager, Twisted Project ---------+ http://radix.twistedmatrix.com

Hi Christopher, [Christopher Armstrong Wed, Aug 18, 2004 at 06:37:58PM -0400]
Note that your 'operations' seem to be syntactical constructs and not operations in a bytecode or objectspace sense. If you really want to forbid certain syntactical constructs you probably need to look at the abstract syntax tree of the code string. Are you sure you want to restrict something on the syntactical level? Or do you rather need to e.g. restrict what can be imported? The latter might be done by providing your own bytecode and/or builtin-implementations within PyPy. And when you expose the above 'obj' what do you expect with respect to attribute-accesses? You can go a long way just having access to a plain python object.
This would be an interesting one-week sprint topic at some point :-) It seems the basic approaches are controling name visibility and/or controling object accesses. Both require careful thinking about all the objects reachable through attribute accesses. Maybe other can chime in here so that we can think together about what the best approach for a restricted Python within PyPy might be. So far it hasn't been anybodies "special interest".
PyPy should generally allow you to avoid C. But I am not sure you have a misconception here. While our intermediate goal is to generate code runnable on top of the CPython runtime PyPy aims to be a self-contained implementation. Do you need the stock CPython runtime for anything in particular or would you be happy with a self-contained Python implementation allowing restricted execution?
Possible starting points include the builtin module (modules/__builtin__*.py), the parser (see my first para, though), or the bytecode implementations (interpreter/pyopcode.py) or ... coming to a sprint which might take place between 9th and 19th of October in Dublin :-) cheers, holger

Thanks for the reply, Holger! On Thu, 19 Aug 2004 09:46:32 +0200, holger krekel <hpk@trillke.net> wrote:
Oops. In PLT-Spy, even those syntactical structures got compiled into simple scheme function calls, so I'm used to thinking of them in that way ;) I doubt it's *absolutely* necessary for me to totally restrict those syntactic operations, but I think it would be cool to be able to do that (it fits with the "absolutely inert environment as a starting-point" principle).
Well, yes, I'm going to have to painstakingly audit every single function, object, or whatever I make accessible to the code to make sure it can't give access to any *other* objects I don't want the user to have. But are you saying there are inherent insecurities in the python object system that give access to things I won't want the user to access? Hopefully PyPy will be flexible enough for me to create simpler versions of the objects, or maybe I can write something that wraps objects and only has accessors for certain attributes (e.g., __call__) to be accessed by the restricted code. Horray capabilities :-)
I'm self-centered ;) I think it would be a more expedient route to my own goals to do it the way I suggested, as it looks like there's a long way to go have PyPy be a total CPython replacement. If, otoh, you guys make it usable in that regard by the time I want to actually apply this, I'd be absolutely thrilled to use PyPy for all of my code :-)
Thanks! I'll look into these when I'm off work.
or ... coming to a sprint which might take place between 9th and 19th of October in Dublin :-)
:-( I'm in the middle of a move and a new full-time job (that has no interest in me researching this subject ;) so I doubt I'll have the time to go to any sprints within the next year, but who knows! Maybe I can organize one in Australia ;-) -- Twisted | Christopher Armstrong: International Man of Twistery Radix | Release Manager, Twisted Project ---------+ http://radix.twistedmatrix.com

Hi Christopher, [Christopher Armstrong Thu, Aug 19, 2004 at 04:15:14AM -0400]
I think the challenge is to find an easy scheme for doing restricted execution in Python. So far it seems that there is a lot of complexity involved. The specific goal of disallowing File/IO operations and restricting memory/cpu usage seems interesting enough to look for a specific solution and not for some general scheme of dis/allowing object accesses. One basic idea is to be able to allow/disallow the usage of code objects according to in which module they live. We then wouldn't disallow "import socket" or any other import which leads to importing socket but we would say "when i execute this function f() here i don't want to execute code that comes from the module 'socket'". (Of course you can also have a positive list of allowed modules etc.pp.). So we would have to keep exact references where code originates/was build (mostly 'co_filename') and we would need the interpreter to check for these references and some policies at certain points. This seems easy enough to do in PyPy.
We'll see. In some ways PyPy is not that far away from being self-contained. FYI, we really have only been hacking for ca. 7 sprint weeks so far plus some extra weeks spent by a few developers.
Uh oh, i see. Well then good luck at your new job, anyway. It's not even related to Twisted? cheers, Holger

Yo Homey, On Thu, 19 Aug 2004 10:49:09 +0200, holger krekel <hpk@trillke.net> wrote:
Yes, doing it in an easy way (without the painstaking auditing) is indeed a challenge, and not really one I'm interested in. Of course, a lot of the auditing could be shared; i.e., I could come up with a whitelist that's generally useful and secure in pretty much any context and so other people wouldn't have to re-audit these functions. However, the only truly "easy" way is blacklisting (at least this is how it seems most people think when they think of restricted execution), but I find blacklisting totally inadequate.
*shrug*. Really, what applications is that useful for? Pretty much all blacklisting schemes I've seen (e.g. disallowing File/IO operations) have either been boring or still allow access to things I don't want users to access.
Hmm, strange. Well, I still don't think it's relevant to my case; I doubt I will allow access to *any* stdlib modules in my particular use-case (which I will explain further down) for restricted-execution. I will only put very application-specific things in my whitelist.
I hope that whatever hacking I do for restricted execution would also improve PyPy as a general purpose CPython replacement.
Oh, it involves Python, Twisted, and lots of other things I'm pretty involved with, but unfortunately, not games or restricted execution ;-( I plan on putting almost all of my free time into those two goals over the coming years, though. I guess I might as well explain what I really want to do with this restricted execution: as many do, I want to deploy it in the context of a multiplayer distributed virtual world where untrusted users (i.e., free accounts) can add custom behavior to their avatar or the objects they create. SecondLife[1] has done this in an amazing way (in that it's the most technologically advanced I've seen today), except the language they use is crap, it's not open source, and there are still many flaws[2]. Other older text-based systems (LambdaMOO, IIRC?) offer scripting to untrusted users as well. In a slightly different application, the users themselves wouldn't be able to script their objects, but server administrators (in a distributed VW) will. If you want a more focused *game* virtual world, it's pretty impractical to let users script their own stuff (well, maybe in more limited degrees...). However, it could still be very useful for distributed worlds (in a network and social sense). For example, you have server A and server B, run by Bob and Jane... They want a portal between their servers, and they want to allow users to bring objects developed by Bob and Jane to each other's server. But Bob doesn't want users on Jane's machine bringing an Ultimate Sword of Immense Kitten Slaying onto his server (or an Ultimate Bomb of L33t /etc/shadow reading, for that matter). So he'll run her code in his restricted execution environment so they can't read his /etc/shadow or do more than 5000 points of damage. 1: http://secondlife.com/ 2: mostly server-crashing bugs, but I'm sure there are some that can be used more maliciously. -- Twisted | Christopher Armstrong: International Man of Twistery Radix | Release Manager, Twisted Project ---------+ http://radix.twistedmatrix.com

[Christopher Armstrong Thu, Aug 19, 2004 at 05:22:22AM -0400]
Maybe you could put those application specific things into a module and whitelist it using the above scheme.
Ah, now it gets interesting! :-)
I begin to see. Wouldn't you need quite a lot of access to the gaming application code if you want to give users freedom to script their objects? Btw, have you looked at zope3's security model? Kapil Thangavelu has written a nice readme file (with an example of distributed agents moving between sandboxes where they have different access capabilities): http://cvs.zope.org/Zope3/src/zope/security/readme.txt?rev=1.5.12.3 I have to admit that if i would go for the "scripting within games" goal i probably wouldn't go for neither of the above. I'd probably think more about kernel-level sandboxes and using unix permissions or posix-acls for file accesses (if any) and a pipe to a VW server to communicate actions and events (with its own security/sanity checks). Hum, maybe there would be a nice application for a (user-mode) linux-kernel and a lean PyPy on top of it with no c-libraries involved whatsoever as this could reduce memory/resource usage to a degree where granting "free accounts" is cheap enough. cheers, holger

'Sup dude, On Thu, 19 Aug 2004 11:57:33 +0200, holger krekel <hpk@trillke.net> wrote:
Yeah, there would be a sizeable API. But the access would be very carefully moderated and hand-picked. This is really the hard project; I think implementing the r-exec system at the core is much easier than doing the actual API design and auditing.
Hmm, no I haven't. *just skimmed* Did he actually write a sandbox that allows untrusted python code to run securely? I was under the impression that running Python code with CPython's runtime securely is an absolutely futile idea. I'll try to grab kapil on IRC.
That would be cool, but it seems pretty hard (well, maybe in a few more years of computing power advancement ;), and I think it's acceptable to only use a limited pool of UMLs that run multiple users' code. Here's why. An important distinction to make is that between host and simulation security; r-exec is still absolutely necessary even if you have a UML for each script. r-exec *can* protect both the host and the simulation. Of course, it's probably a good idea to keep a UML or whatever as an extra layer of security, for the same reason that you run various system daemons under chroots. For simulation-security, though, a UML won't help; you still need to carefully restrict every capability you give to the user (so that he can't use a python object system trick to gain a reference to someone three rooms over and hit them with his sword). -- Twisted | Christopher Armstrong: International Man of Twistery Radix | Release Manager, Twisted Project ---------+ http://radix.twistedmatrix.com

[Christopher Armstrong Thu, Aug 19, 2004 at 06:30:59AM -0400]
indeed.
As PyPy will be faster than C i don't see a speed problem :-) Seriously, though, if the code is to interact with a gaming api and not drive e.g. some graphics hardware i don't see a big computing power problem with PyPy on top of UserModeLinux even if PyPy would be five times slower than CPython.
Well, a UML configuration (reused for every script with Copy-On-Write or read only FS devices) should provide enough host security, possibly running on some secure linux capabilities based flavour. And, indeed, simulation security has to be taken care off separately. Wouldn't it make sense to define a "command protocol" with integrated simulation security restrictions and provide a client side python library for speaking this protocol? This way the user is free to program whatever she pleases but is restricted through host security (including CPU/RAM/FS restrictions) and can only produce commands which pass simulation security at the server side. That being said i wouldn't mind if we can come up with some sensible easy enough r-exec mechanism within PyPy that would suit your Massively Scripted Multiplayer (tm) scenario. Any more ideas anyone? cheers, Holger

On Thu, 19 Aug 2004 13:32:28 +0200, holger krekel <hpk@trillke.net> wrote:
Oh, I'm not worried about PyPy's performance -- the performance I was referring to was that of UML. Running hundreds of UMLs on a machine right now is totally impractical, if I'm not mistaken.
mmmh... I'll think about this more :) -- Twisted | Christopher Armstrong: International Man of Twistery Radix | Release Manager, Twisted Project ---------+ http://radix.twistedmatrix.com

[Christopher Armstrong Thu, Aug 19, 2004 at 04:30:58PM -0400]
I think you are right that running hundreds of UMLs is impractical (thought i believe this is mostly because you have to assign static RAM amounts at boot time due to the linux kernel's VM architecture). Maybe running thousands of user-processes on some form of se-linux or OpenBSD with tight security settings (no opening of ports etc.pp.) would do for the host security part. anyway, "restricted execution" within PyPy surely remains something we should think about especially if we have concrete use cases. have fun, holger P.S.: let me know what you got out of Kapil and if the zope3 security architecture suits you. i am curious in case i want to write my own scriptable MMG :-)

Hi Christopher, [Christopher Armstrong Wed, Aug 18, 2004 at 06:37:58PM -0400]
Note that your 'operations' seem to be syntactical constructs and not operations in a bytecode or objectspace sense. If you really want to forbid certain syntactical constructs you probably need to look at the abstract syntax tree of the code string. Are you sure you want to restrict something on the syntactical level? Or do you rather need to e.g. restrict what can be imported? The latter might be done by providing your own bytecode and/or builtin-implementations within PyPy. And when you expose the above 'obj' what do you expect with respect to attribute-accesses? You can go a long way just having access to a plain python object.
This would be an interesting one-week sprint topic at some point :-) It seems the basic approaches are controling name visibility and/or controling object accesses. Both require careful thinking about all the objects reachable through attribute accesses. Maybe other can chime in here so that we can think together about what the best approach for a restricted Python within PyPy might be. So far it hasn't been anybodies "special interest".
PyPy should generally allow you to avoid C. But I am not sure you have a misconception here. While our intermediate goal is to generate code runnable on top of the CPython runtime PyPy aims to be a self-contained implementation. Do you need the stock CPython runtime for anything in particular or would you be happy with a self-contained Python implementation allowing restricted execution?
Possible starting points include the builtin module (modules/__builtin__*.py), the parser (see my first para, though), or the bytecode implementations (interpreter/pyopcode.py) or ... coming to a sprint which might take place between 9th and 19th of October in Dublin :-) cheers, holger

Thanks for the reply, Holger! On Thu, 19 Aug 2004 09:46:32 +0200, holger krekel <hpk@trillke.net> wrote:
Oops. In PLT-Spy, even those syntactical structures got compiled into simple scheme function calls, so I'm used to thinking of them in that way ;) I doubt it's *absolutely* necessary for me to totally restrict those syntactic operations, but I think it would be cool to be able to do that (it fits with the "absolutely inert environment as a starting-point" principle).
Well, yes, I'm going to have to painstakingly audit every single function, object, or whatever I make accessible to the code to make sure it can't give access to any *other* objects I don't want the user to have. But are you saying there are inherent insecurities in the python object system that give access to things I won't want the user to access? Hopefully PyPy will be flexible enough for me to create simpler versions of the objects, or maybe I can write something that wraps objects and only has accessors for certain attributes (e.g., __call__) to be accessed by the restricted code. Horray capabilities :-)
I'm self-centered ;) I think it would be a more expedient route to my own goals to do it the way I suggested, as it looks like there's a long way to go have PyPy be a total CPython replacement. If, otoh, you guys make it usable in that regard by the time I want to actually apply this, I'd be absolutely thrilled to use PyPy for all of my code :-)
Thanks! I'll look into these when I'm off work.
or ... coming to a sprint which might take place between 9th and 19th of October in Dublin :-)
:-( I'm in the middle of a move and a new full-time job (that has no interest in me researching this subject ;) so I doubt I'll have the time to go to any sprints within the next year, but who knows! Maybe I can organize one in Australia ;-) -- Twisted | Christopher Armstrong: International Man of Twistery Radix | Release Manager, Twisted Project ---------+ http://radix.twistedmatrix.com

Hi Christopher, [Christopher Armstrong Thu, Aug 19, 2004 at 04:15:14AM -0400]
I think the challenge is to find an easy scheme for doing restricted execution in Python. So far it seems that there is a lot of complexity involved. The specific goal of disallowing File/IO operations and restricting memory/cpu usage seems interesting enough to look for a specific solution and not for some general scheme of dis/allowing object accesses. One basic idea is to be able to allow/disallow the usage of code objects according to in which module they live. We then wouldn't disallow "import socket" or any other import which leads to importing socket but we would say "when i execute this function f() here i don't want to execute code that comes from the module 'socket'". (Of course you can also have a positive list of allowed modules etc.pp.). So we would have to keep exact references where code originates/was build (mostly 'co_filename') and we would need the interpreter to check for these references and some policies at certain points. This seems easy enough to do in PyPy.
We'll see. In some ways PyPy is not that far away from being self-contained. FYI, we really have only been hacking for ca. 7 sprint weeks so far plus some extra weeks spent by a few developers.
Uh oh, i see. Well then good luck at your new job, anyway. It's not even related to Twisted? cheers, Holger

Yo Homey, On Thu, 19 Aug 2004 10:49:09 +0200, holger krekel <hpk@trillke.net> wrote:
Yes, doing it in an easy way (without the painstaking auditing) is indeed a challenge, and not really one I'm interested in. Of course, a lot of the auditing could be shared; i.e., I could come up with a whitelist that's generally useful and secure in pretty much any context and so other people wouldn't have to re-audit these functions. However, the only truly "easy" way is blacklisting (at least this is how it seems most people think when they think of restricted execution), but I find blacklisting totally inadequate.
*shrug*. Really, what applications is that useful for? Pretty much all blacklisting schemes I've seen (e.g. disallowing File/IO operations) have either been boring or still allow access to things I don't want users to access.
Hmm, strange. Well, I still don't think it's relevant to my case; I doubt I will allow access to *any* stdlib modules in my particular use-case (which I will explain further down) for restricted-execution. I will only put very application-specific things in my whitelist.
I hope that whatever hacking I do for restricted execution would also improve PyPy as a general purpose CPython replacement.
Oh, it involves Python, Twisted, and lots of other things I'm pretty involved with, but unfortunately, not games or restricted execution ;-( I plan on putting almost all of my free time into those two goals over the coming years, though. I guess I might as well explain what I really want to do with this restricted execution: as many do, I want to deploy it in the context of a multiplayer distributed virtual world where untrusted users (i.e., free accounts) can add custom behavior to their avatar or the objects they create. SecondLife[1] has done this in an amazing way (in that it's the most technologically advanced I've seen today), except the language they use is crap, it's not open source, and there are still many flaws[2]. Other older text-based systems (LambdaMOO, IIRC?) offer scripting to untrusted users as well. In a slightly different application, the users themselves wouldn't be able to script their objects, but server administrators (in a distributed VW) will. If you want a more focused *game* virtual world, it's pretty impractical to let users script their own stuff (well, maybe in more limited degrees...). However, it could still be very useful for distributed worlds (in a network and social sense). For example, you have server A and server B, run by Bob and Jane... They want a portal between their servers, and they want to allow users to bring objects developed by Bob and Jane to each other's server. But Bob doesn't want users on Jane's machine bringing an Ultimate Sword of Immense Kitten Slaying onto his server (or an Ultimate Bomb of L33t /etc/shadow reading, for that matter). So he'll run her code in his restricted execution environment so they can't read his /etc/shadow or do more than 5000 points of damage. 1: http://secondlife.com/ 2: mostly server-crashing bugs, but I'm sure there are some that can be used more maliciously. -- Twisted | Christopher Armstrong: International Man of Twistery Radix | Release Manager, Twisted Project ---------+ http://radix.twistedmatrix.com

[Christopher Armstrong Thu, Aug 19, 2004 at 05:22:22AM -0400]
Maybe you could put those application specific things into a module and whitelist it using the above scheme.
Ah, now it gets interesting! :-)
I begin to see. Wouldn't you need quite a lot of access to the gaming application code if you want to give users freedom to script their objects? Btw, have you looked at zope3's security model? Kapil Thangavelu has written a nice readme file (with an example of distributed agents moving between sandboxes where they have different access capabilities): http://cvs.zope.org/Zope3/src/zope/security/readme.txt?rev=1.5.12.3 I have to admit that if i would go for the "scripting within games" goal i probably wouldn't go for neither of the above. I'd probably think more about kernel-level sandboxes and using unix permissions or posix-acls for file accesses (if any) and a pipe to a VW server to communicate actions and events (with its own security/sanity checks). Hum, maybe there would be a nice application for a (user-mode) linux-kernel and a lean PyPy on top of it with no c-libraries involved whatsoever as this could reduce memory/resource usage to a degree where granting "free accounts" is cheap enough. cheers, holger

'Sup dude, On Thu, 19 Aug 2004 11:57:33 +0200, holger krekel <hpk@trillke.net> wrote:
Yeah, there would be a sizeable API. But the access would be very carefully moderated and hand-picked. This is really the hard project; I think implementing the r-exec system at the core is much easier than doing the actual API design and auditing.
Hmm, no I haven't. *just skimmed* Did he actually write a sandbox that allows untrusted python code to run securely? I was under the impression that running Python code with CPython's runtime securely is an absolutely futile idea. I'll try to grab kapil on IRC.
That would be cool, but it seems pretty hard (well, maybe in a few more years of computing power advancement ;), and I think it's acceptable to only use a limited pool of UMLs that run multiple users' code. Here's why. An important distinction to make is that between host and simulation security; r-exec is still absolutely necessary even if you have a UML for each script. r-exec *can* protect both the host and the simulation. Of course, it's probably a good idea to keep a UML or whatever as an extra layer of security, for the same reason that you run various system daemons under chroots. For simulation-security, though, a UML won't help; you still need to carefully restrict every capability you give to the user (so that he can't use a python object system trick to gain a reference to someone three rooms over and hit them with his sword). -- Twisted | Christopher Armstrong: International Man of Twistery Radix | Release Manager, Twisted Project ---------+ http://radix.twistedmatrix.com

[Christopher Armstrong Thu, Aug 19, 2004 at 06:30:59AM -0400]
indeed.
As PyPy will be faster than C i don't see a speed problem :-) Seriously, though, if the code is to interact with a gaming api and not drive e.g. some graphics hardware i don't see a big computing power problem with PyPy on top of UserModeLinux even if PyPy would be five times slower than CPython.
Well, a UML configuration (reused for every script with Copy-On-Write or read only FS devices) should provide enough host security, possibly running on some secure linux capabilities based flavour. And, indeed, simulation security has to be taken care off separately. Wouldn't it make sense to define a "command protocol" with integrated simulation security restrictions and provide a client side python library for speaking this protocol? This way the user is free to program whatever she pleases but is restricted through host security (including CPU/RAM/FS restrictions) and can only produce commands which pass simulation security at the server side. That being said i wouldn't mind if we can come up with some sensible easy enough r-exec mechanism within PyPy that would suit your Massively Scripted Multiplayer (tm) scenario. Any more ideas anyone? cheers, Holger

On Thu, 19 Aug 2004 13:32:28 +0200, holger krekel <hpk@trillke.net> wrote:
Oh, I'm not worried about PyPy's performance -- the performance I was referring to was that of UML. Running hundreds of UMLs on a machine right now is totally impractical, if I'm not mistaken.
mmmh... I'll think about this more :) -- Twisted | Christopher Armstrong: International Man of Twistery Radix | Release Manager, Twisted Project ---------+ http://radix.twistedmatrix.com

[Christopher Armstrong Thu, Aug 19, 2004 at 04:30:58PM -0400]
I think you are right that running hundreds of UMLs is impractical (thought i believe this is mostly because you have to assign static RAM amounts at boot time due to the linux kernel's VM architecture). Maybe running thousands of user-processes on some form of se-linux or OpenBSD with tight security settings (no opening of ports etc.pp.) would do for the host security part. anyway, "restricted execution" within PyPy surely remains something we should think about especially if we have concrete use cases. have fun, holger P.S.: let me know what you got out of Kapil and if the zope3 security architecture suits you. i am curious in case i want to write my own scriptable MMG :-)
participants (2)
-
Christopher Armstrong
-
holger krekel