Hello! This is my first posting to the python-dev list, so please forgive me if I violate any unspoken etiquette here. :) I was looking at Python 2.x's f_restricted frame flag (or, rather, the numerous ways around it) and noticed that most (all?) of the attacks to escape restricted execution involved the attacker grabbing something he wasn't supposed to have. IMO, Python's extensive introspection features make that a losing battle, since it's simply too easy to forget to blacklist something and the attacker finding it. Not only that, even with a perfect vacuum-sealed jail, an attacker can still bring down the interpreter by exhausting memory or consuming excess CPU. I think I might have a way of securely sealing-in untrusted code. It's a fairly nascent idea, though, and I haven't worked out all of the details yet, so I'm posting what I have so far for feedback and for others to try to poke holes in it. Absolutely nothing here is final. I'm just framing out what I generally had in mind. Obviously, it will need to be adjusted to be consistent with "the Python way" - my hope is that this can become a PEP. :)
# It all starts with the introduction of a new type, called a jail. (I haven't yet worked out whether it should be a builtin type, ... # or a module.) Unjailed code can create jails, which will run the untrusted code and keep strict limits on it. ... j = jail() dir(j) ['__class__', '__delattr__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'acquire', 'getcpulimit', 'getcpuusage', 'getmemorylimit', 'getmemoryusage', 'gettimelimit', 'gettimeusage', 'release', 'setcpulimit', 'setmemorylimit', 'settimelimit'] # The jail monitors three things: Memory (in bytes), real time (in seconds), and CPU time (also in seconds) ... # and it also allows you to impose limits on them. If any limit is non-zero, code in that jail may not exceed its limit. ... # Exceeding a memory limit will result in a MemoryError. I haven't decided what CPU/real time limits should raise. ... # The other two calls are "acquire" and "release," which allow you to seal (any) objects inside the jail, or bust them # out. Objects inside the jail (i.e. created by code in that jail) contribute their __sizeof__() to the j.getmemoryusage() ... def stealPasswd(): ... return open('/etc/passwd','r').read() ... j.acquire(stealPasswd) j.getmemoryusage() # The stealPasswd function, its code, etc. are now locked away within the jail. 375 stealPasswd() Traceback (most recent call last): File "<stdin>", line 1, in <module> JailError: tried to access an object outside of the jail
The object in question is, of course, 'open'. Unlike the f_restricted model, the jail was freely able to grab the open() function, but was absolutely unable to touch it: It can't call it, set/get/delete attributes/items, or pass it as an argument to any functions. There are three criteria that determine whether an object can be accessed: a. The code accessing the object is not within a jail; or b. The object belongs to the same jail as the code accessing the object; or c. The object has an __access__ function, and theObject.__access__(theJail) returns True. For the jail to be able to access 'open', it needs to be given access explicitly. I haven't quite decided how this should work, but I had in mind the creation of a "guard" (essentially a proxy) that allows the jail to access the object. It belongs to the same jail as the guarded object (and is therefore impossible to create within a jail unless the guarded object belongs to the same jail), has a list of jails (or None for 'any') that the guard will allow to __access__ it (the guard is immutable, so jails can't mess with it even though they can access it), and what the guard will allow though it (read-write, read-only, call-within-jail, call-outside-jail). I have a couple remaining issues that I haven't quite sussed out: * How exactly do guards work? I had in mind a system of proxies (memory usage is a concern, especially in memory-limited jails - maybe allow __access__ to return specific modes of access rather than all-or-nothing?) that recursively return more guards after operations. (e.g., if I have a guard allowing read+call on sys, sys.stdout would return another guard allowing read+call on sys.stdout, likewise for sys.stdout.write) * How are objects sealed in the jail? j.acquire can lead to serious problems with lots of references getting recursively sealed in. Maybe disallow sealing in anything but code objects, or allow explicitly running code within a jail like j.execute(code, globals(), locals()), which works fine since any objects created by jailed code are also jailed. * How do imports work? Should __import__ be modified so that when a jail invokes it, the import runs normally (unjailed), and then returns the module with a special guard that allows read-only+call-within, but not on builtins? This has a nice advantage, since jailed code can import e.g. socket, and maybe even create a socket, but won't be able to do sock.connect(...), since socket.connect (which is running with jailed permissions) can't touch the builtin _socket module. * Is obj.__access__(j) the best way to decide access? It doesn't allow programmers much freedom to customize the jail policy since they can't modify __access__ for builtins. Maybe the jail should have the first chance (such as j.__access__(obj)), which allows programmers to subclass the jail, and the jail can fallback to obj.__access__(j) * How does Python keep track of what jail each frame is in? Maybe each frame can have a frame.f_jail, which references the jail object restricting that frame (or None for unjailed code) - frames' jails default to the jail holding the code object, or can be explicitly overridden (as in j.execute(code, globals(), locals())) * When are jails switched? Obviously, jailed code called from unjailed code (or even from other unjailed code) should be executed in the callee jail... But if a jailed caller is calling unjailed code, does the jail follow, or does the unjailed code run in an unjailed frame? How do programmers specify that? ...that's pretty much my two (erm, twenty) cents on the matter. Again, any feedback/adversarial reasoning you guys can offer is much appreciated.
On Fri, 10 Jun 2011 18:23:47 -0600, Sam Edwards <sam.edwards@Colorado.EDU> wrote:
Hello! This is my first posting to the python-dev list, so please forgive me if I violate any unspoken etiquette here. :)
Well, hopefully we won't bite, though of course I can't promise anything for anyone else :) I haven't read through your post, but if you don't know about it I suspect that you will be interested in the following: http://code.activestate.com/pypm/pysandbox/ I'm pretty sure Victor will be happy to have someone else interested in this topic. -- R. David Murray http://www.bitdance.com
Le 11/06/2011 02:41, R. David Murray a écrit :
I haven't read through your post, but if you don't know about it I suspect that you will be interested in the following:
http://code.activestate.com/pypm/pysandbox/
I'm pretty sure Victor will be happy to have someone else interested in this topic.
Yes, I am happy :-) The project URL is https://github.com/haypo/pysandbox/ Activestate page is wrong: pysanbox does support Python 3 (Python 2.5 - 3.3). pysandbox uses different policy depending on the problem. For example, whitelist for builtins, blacklist for object attributes. pysandbox is based on Tav's ideas. The main idea of pysandbox is to execute untrusted in a new empty namespace, the untrusted namespace. Objects imported into this namespace are imported as proxies to get a read-only view of the Python namespace. Importing modules is protected by a whitelist (modules and symbols names). To protect the namespace, some introspection attributes are hidden like __subclasses__ or __self__. Performances are supposed to be close to a classic Python interpreter (I didn't run a benchmark, I don't really care). An empty namespace is not enough to protect Python: pysandbox denies the execution of arbitrary bytecode, write files, write to stdout/stderr, exit Python, etc. Tav's sandbox is good to deny everything, whereas you can configure pysandbox to enable some features (e.g. exit Python, useful for an interpreter). About restricted mode: you can also configure pysandbox to use it, but the restricted mode is too much restrictive: you cannot open files, whereas pysandbox allows to read files in a whitelist (e.g. useful to display a backtrace). If you would like to implement your own sandbox: great! You should try pysandbox test suite, I'm proud of it :-) I am still not sure that pysandbox approach is the good one: if you find a vulnerability to escape pysandbox "jail" (see pysandbox Changelog, it would not be the first time), you can do anything. PyPy sandbox and "Seccomp nurse" (specific to Linux?) use two processes: the Python process cannot do anything, it relies completly in a trusted process which control all operations. I don't understand exactly how it is different: a vulnerability in the trusted process gives also full control, but it's maybe a safer approach. Or the difference is maybe that the implementation is simpler (less code?) and so safer (less code usually means less bugs). "Seccomp nurse": http://chdir.org/~nico/seccomp-nurse/ I tested recently AppEngine sandbox (testable online via http://shell.appspot.com/): it is secure *and* powerful, quite all modules are allowed (except not ctypes, as expected). AppEngine is not a Python sandbox: it's a sandbox between the Python process and Linux kernel, so it protects also modules written in C (pysandbox is unable to protect these modules). AppEngine modifies the Python standard library to cooperate with the low-level sandbox, e.g. raise nice error messages with open(filename, "w"): invalid file mode (instead of an ugly OSError with a cryptic message). Get more information about pysandbox and other sandboxes in pysandbox README file. Victor
Hi Sam, Have you seen this? http://tav.espians.com/paving-the-way-to-securing-the-python-interpreter.htm... It might relate a similar idea. There were a few iterations of Tav's approach. --Guido On Fri, Jun 10, 2011 at 5:23 PM, Sam Edwards <sam.edwards@colorado.edu> wrote:
Hello! This is my first posting to the python-dev list, so please forgive me if I violate any unspoken etiquette here. :)
I was looking at Python 2.x's f_restricted frame flag (or, rather, the numerous ways around it) and noticed that most (all?) of the attacks to escape restricted execution involved the attacker grabbing something he wasn't supposed to have. IMO, Python's extensive introspection features make that a losing battle, since it's simply too easy to forget to blacklist something and the attacker finding it. Not only that, even with a perfect vacuum-sealed jail, an attacker can still bring down the interpreter by exhausting memory or consuming excess CPU.
I think I might have a way of securely sealing-in untrusted code. It's a fairly nascent idea, though, and I haven't worked out all of the details yet, so I'm posting what I have so far for feedback and for others to try to poke holes in it.
Absolutely nothing here is final. I'm just framing out what I generally had in mind. Obviously, it will need to be adjusted to be consistent with "the Python way" - my hope is that this can become a PEP. :)
# It all starts with the introduction of a new type, called a jail. (I haven't yet worked out whether it should be a builtin type, ... # or a module.) Unjailed code can create jails, which will run the untrusted code and keep strict limits on it. ... j = jail() dir(j) ['__class__', '__delattr__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'acquire', 'getcpulimit', 'getcpuusage', 'getmemorylimit', 'getmemoryusage', 'gettimelimit', 'gettimeusage', 'release', 'setcpulimit', 'setmemorylimit', 'settimelimit'] # The jail monitors three things: Memory (in bytes), real time (in seconds), and CPU time (also in seconds) ... # and it also allows you to impose limits on them. If any limit is non-zero, code in that jail may not exceed its limit. ... # Exceeding a memory limit will result in a MemoryError. I haven't decided what CPU/real time limits should raise. ... # The other two calls are "acquire" and "release," which allow you to seal (any) objects inside the jail, or bust them # out. Objects inside the jail (i.e. created by code in that jail) contribute their __sizeof__() to the j.getmemoryusage() ... def stealPasswd(): ... return open('/etc/passwd','r').read() ... j.acquire(stealPasswd) j.getmemoryusage() # The stealPasswd function, its code, etc. are now locked away within the jail. 375 stealPasswd() Traceback (most recent call last): File "<stdin>", line 1, in <module> JailError: tried to access an object outside of the jail
The object in question is, of course, 'open'. Unlike the f_restricted model, the jail was freely able to grab the open() function, but was absolutely unable to touch it: It can't call it, set/get/delete attributes/items, or pass it as an argument to any functions. There are three criteria that determine whether an object can be accessed: a. The code accessing the object is not within a jail; or b. The object belongs to the same jail as the code accessing the object; or c. The object has an __access__ function, and theObject.__access__(theJail) returns True.
For the jail to be able to access 'open', it needs to be given access explicitly. I haven't quite decided how this should work, but I had in mind the creation of a "guard" (essentially a proxy) that allows the jail to access the object. It belongs to the same jail as the guarded object (and is therefore impossible to create within a jail unless the guarded object belongs to the same jail), has a list of jails (or None for 'any') that the guard will allow to __access__ it (the guard is immutable, so jails can't mess with it even though they can access it), and what the guard will allow though it (read-write, read-only, call-within-jail, call-outside-jail).
I have a couple remaining issues that I haven't quite sussed out: * How exactly do guards work? I had in mind a system of proxies (memory usage is a concern, especially in memory-limited jails - maybe allow __access__ to return specific modes of access rather than all-or-nothing?) that recursively return more guards after operations. (e.g., if I have a guard allowing read+call on sys, sys.stdout would return another guard allowing read+call on sys.stdout, likewise for sys.stdout.write) * How are objects sealed in the jail? j.acquire can lead to serious problems with lots of references getting recursively sealed in. Maybe disallow sealing in anything but code objects, or allow explicitly running code within a jail like j.execute(code, globals(), locals()), which works fine since any objects created by jailed code are also jailed. * How do imports work? Should __import__ be modified so that when a jail invokes it, the import runs normally (unjailed), and then returns the module with a special guard that allows read-only+call-within, but not on builtins? This has a nice advantage, since jailed code can import e.g. socket, and maybe even create a socket, but won't be able to do sock.connect(...), since socket.connect (which is running with jailed permissions) can't touch the builtin _socket module. * Is obj.__access__(j) the best way to decide access? It doesn't allow programmers much freedom to customize the jail policy since they can't modify __access__ for builtins. Maybe the jail should have the first chance (such as j.__access__(obj)), which allows programmers to subclass the jail, and the jail can fallback to obj.__access__(j) * How does Python keep track of what jail each frame is in? Maybe each frame can have a frame.f_jail, which references the jail object restricting that frame (or None for unjailed code) - frames' jails default to the jail holding the code object, or can be explicitly overridden (as in j.execute(code, globals(), locals())) * When are jails switched? Obviously, jailed code called from unjailed code (or even from other unjailed code) should be executed in the callee jail... But if a jailed caller is calling unjailed code, does the jail follow, or does the unjailed code run in an unjailed frame? How do programmers specify that?
...that's pretty much my two (erm, twenty) cents on the matter. Again, any feedback/adversarial reasoning you guys can offer is much appreciated. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
At 06:23 PM 6/10/2011 -0600, Sam Edwards wrote:
I have a couple remaining issues that I haven't quite sussed out: [long list of questions deleted]
You might be able to answer some of them by looking at this project: http://pypi.python.org/pypi/RestrictedPython Which implements the necessary ground machinery for doing that sort of thing, in the form of a specialized Python compiler (implemented in Python, for 2.3 through 2.7) that allows you to implement whatever sorts of guards and security policies you want on top of it. Even if it doesn't answer all your questions in and of itself, it may prove a fruitful environment in which you can experiment with various approaches and see which ones you actually like, without first having to write a bunch of code yourself. Discussing an official implementation of this sort of thing as a language feature is probably best left to python-ideas, though, until and unless you actually have a PEP to propose.
All, Thanks for the quick responses! I've skimmed the pysandbox code yesterday. I think Victor has the right idea with relying on a whitelist, as well as limiting execution time. The fact that untrusted code can still execute memory exhaustion attacks is the only thing that still worries me: It's hard to write a server that will run hundreds of scripts from untrusted users, since one of them can bring down the entire server by writing an infinite loop that allocates tons of objects. Python needs a way to hook the object-allocation process in order to (effectively) limit how much memory untrusted code can consume. Tav's blog post makes some interesting points... The object-capability model definitely has the benefit of efficiency; simply getting the reference to an object means the untrusted code is trusted with full capability to that object (which saves having to query the jail every time the object is touched) - it's just as fast as unrestricted Python, which I like. Perhaps my jails idea should then be refactored into some mechanism for monitoring and limiting memory and CPU usage -- it's the perfect thing to ship as an extension, the only shame is that it requires interpreter support. Anyway, in light of Tav's post which seems to suggest that f_restricted frames are impossible to escape (if used correctly), why was f_restricted removed in Python 3? Is it simply that it's too easy to make a mistake and accidentally give an attacker an unsafe object, or is there some fundamental flaw with it? Could you see something like f_restricted (or f_jail) getting put back in Python 3, if it were a good deal more bulletproof? And, yeah, I've been playing with RestrictedPython. It's pretty good, but it lacks memory- and CPU-limiting, which is my main focus right now. And yes, I should probably have posted this to python-ideas, thanks. :) This is a very long way away from a PEP. PyPy's sandboxing feature is probably closest to what I'd like, but I'm looking for something that can coexist in the same process (since running hundreds of interpreter processes continuously has a lot of system memory overhead, it's better if the many untrusted, but independent, jails could share a single interpreter)
participants (5)
-
Guido van Rossum
-
P.J. Eby
-
R. David Murray
-
Sam Edwards
-
Victor Stinner