Hi all, Those that follow IRC already know it, but it's worth being announced a bit more widely: I've been working on a form of sandboxing for RPython programs, which now seems to work for the whole of PyPy. It's "sandboxing" as in "full virtualization", but done in normal C with no OS support at all. It's a two-processes model: we can translate PyPy to a special "pypy-c-sandbox" executable, which is safe in the sense that it doesn't do any library or system call - instead, whenever it would like to perform such an operation, it marshals the operation name and the arguments to its stdout and it waits for the marshalled result on its stdin. This pypy-c-sandbox process is meant to be run by an outer "controller" program that answers to these operation requests. The pypy-c-sandbox program is obtained by adding a transformation during translation, which turns all RPython-level external function calls into stubs that do the marshalling/waiting/unmarshalling. An attacker that tries to escape the sandbox is stuck within a C program that contains no external function call at all except to write to stdout and read from stdin. (It's still attackable, e.g. by exploiting segfault-like situations, but as far as I can tell - unlike CPython - any RPython program is really robust against this kind of attack, at least if we enable the extra checks that all RPython list and string indexing are in range. Alternatively, on Linux there is a lightweight OS-level sandboxing technique available by default - google for 'seccomp'.) The outer controller is a plain Python program that can run in CPython or a regular PyPy. It can perform any virtualization it likes, by giving the subprocess any custom view on its world. For example, while the subprocess thinks it's using file handles, in reality the numbers are created by the controller process and so they need not be (and probably should not be) real OS-level file handles at all. In the demo controller I've implemented there is simply a mapping from numbers to file-like objects. The controller answers to the "os_open" operation by translating the requested path to some file or file-like object in some virtual and completely custom directory hierarchy. The file-like object is put in the mapping with any unused number >= 3 as a key, and the latter is returned to the subprocess. The "os_read" operation works by mapping the pseudo file handle given by the subprocess back to a file-like object in the controller, and reading from the file-like object. Enough explanations, here's a how-to: For now, this lives in a branch at the 'pypy' level: cd ..../pypy-dist/pypy svn switch http://codespeak.net/svn/pypy/branch/pypy-more-rtti-inprogress/ In pypy/translator/goal: ./translate.py --sandbox --source targetpypystandalone.py --withoutmod-gc --withoutmod-_weakref (the gc and _weakref modules are disabled because they are a bit too unsafe, in the sense that they could allow bogus memory accesses) Then in the directory where the sources were created, compile with the extra RPython-level assertions enabled: make CFLAGS="-O2 -DRPY_ASSERT" mv testing_1 /some/path/pypy-c-sandbox Run it with the tools in the pypy/translator/sandbox directory: ./pypy_interact.py /some/path/pypy-c-sandbox [args...] Just like pypy-c, if you pass no argument you get the interactive prompt. In theory it's impossible to do anything bad or read a random file on the machine from this prompt. (There is no protection against using all the RAM or CPU yet.) To pass a script as an argument you need to put it in a directory along with all its dependencies, and ask pypy_interact to export this directory (read-only) to the subprocess' virtual /tmp directory with the --tmp=DIR option. Not all operations are supported; e.g. if you type os.readlink('...'), the controller crashes with an exception and the subprocess is killed. Other operations make the subprocess die directly with a "Fatal RPython error". None of this is a security hole; it just means that if you try to run some random program, it risks getting killed depending on the Python built-in functions it tries to call. By the way, as you should have realized, it's really independent from the fact that it's PyPy that we are translating. Any RPython program should do. I've successfully tried it on the JS interpreter. The controller is only called "pypy_interact" because it emulates a file hierarchy that makes pypy-c-sandbox happy - it contains (read-only) virtual directories like /bin/lib-python and /bin/pypy/lib and it pretends that the executable is /bin/pypy-c. A bientot, Armin.
Hi, On Sun, Aug 19, 2007 at 12:33:09PM +0200, Armin Rigo wrote:
Then in the directory where the sources were created, compile with the extra RPython-level assertions enabled:
make CFLAGS="-O2 -DRPY_ASSERT" mv testing_1 /some/path/pypy-c-sandbox
You can now just say 'make llsafer' instead. This enables a new flag, -DRPY_LL_ASSERT, which differs from RPY_ASSERT in some ways explained in translator/c/src/support.h and which is better suited for this situation. I would say that the resulting sandboxed PyPy is quite safe then - at most, it will abort() itself if you play too strange tricks with 'exec new.code(...)'. For paranoia bonus points you can enable both RPY_ASSERT and RPY_LL_ASSERT. For what it's worth, the -DRPY_LL_ASSERT inserts tons of checks everywhere, for an acceptable performance hit (~10%?). A bientot, Armin.
Re-hi, On Mon, Aug 20, 2007 at 05:30:51PM +0200, Armin Rigo wrote:
make CFLAGS="-O2 -DRPY_ASSERT" mv testing_1 /some/path/pypy-c-sandbox
You can now just say 'make llsafer' instead.
While I was at it I enabled RPY_LL_ASSERT automatically in all programs translated with --sandbox, assuming that better safety by default is a good idea in this case. So now you can say ./translate.py --sandbox and get a correctly compiled result for free. A bientot, Armin.
On Sun, 19 Aug 2007 12:33:09 +0200 Armin Rigo <arigo@tunes.org> wrote:
Hi all,
Those that follow IRC already know it, but it's worth being announced a bit more widely: I've been working on a form of sandboxing for RPython programs, which now seems to work for the whole of PyPy.
It's "sandboxing" as in "full virtualization", but done in normal C with no OS support at all. It's a two-processes model: we can translate PyPy to a special "pypy-c-sandbox" executable, which is safe in the sense that it doesn't do any library or system call - instead, whenever it would like to perform such an operation, it marshals the operation name and the arguments to its stdout and it waits for the marshalled result on its stdin. This pypy-c-sandbox process is meant to be run by an outer "controller" program that answers to these operation requests.
How is this different to just linking against a libc wrapper (that does whatever marshal magic is required) ? Simon.
Hi Simon, On Wed, Sep 19, 2007 at 01:36:43PM -0700, Simon Burton wrote:
It's "sandboxing" as in "full virtualization", but done in normal C with no OS support at all. (...)
How is this different to just linking against a libc wrapper (that does whatever marshal magic is required) ?
The result is similar; what differs is how we arrive there, and the level of confidence I'd have in the security of the result. In the case of PyPy the wrapping is done automatically and in a platform-independent way; contrast this with the need for the designers of the libc wrapper to carefully close all possible ways the C program could invoke the system and carefully review the result, which is error-prone and platform-specific. More importantly for the user, in the PyPy approach the C code is not random C code, but was generated from RPython. This (together with extra run-time assertions that the translation toolchain can insert for the paranoid) means that buffer overflow or memory management attacks should not be possible. This means that there is no need to review the source code of the whole PyPy interpreter for security issues. By contrast, if you take say CPython and put it inside a libc wrapper, the result is not safe because CPython itself is open to attacks (e.g. memory management issues where carefully crafted app-level Python code could force CPython to execute arbitrary machine code - including system calls bypassing the libc wrapper). A bientot, Armin.
participants (2)
-
Armin Rigo
-
Simon Burton