[pypy-dev] pypy-dev at codespeak.net

Armin Rigo arigo at tunes.org
Sun Aug 19 12:33:09 CEST 2007

Hi all,

Those that follow IRC already know it, but it's worth being announced a
bit more widely: I've been working on a form of sandboxing for RPython
programs, which now seems to work for the whole of PyPy.

It's "sandboxing" as in "full virtualization", but done in normal C with
no OS support at all.  It's a two-processes model: we can translate PyPy
to a special "pypy-c-sandbox" executable, which is safe in the sense
that it doesn't do any library or system call - instead, whenever it
would like to perform such an operation, it marshals the operation name
and the arguments to its stdout and it waits for the marshalled result
on its stdin.  This pypy-c-sandbox process is meant to be run by an
outer "controller" program that answers to these operation requests.

The pypy-c-sandbox program is obtained by adding a transformation during
translation, which turns all RPython-level external function calls into
stubs that do the marshalling/waiting/unmarshalling.  An attacker that
tries to escape the sandbox is stuck within a C program that contains no
external function call at all except to write to stdout and read from
stdin.  (It's still attackable, e.g. by exploiting segfault-like
situations, but as far as I can tell - unlike CPython - any RPython
program is really robust against this kind of attack, at least if we
enable the extra checks that all RPython list and string indexing are in
range.  Alternatively, on Linux there is a lightweight OS-level
sandboxing technique available by default - google for 'seccomp'.)

The outer controller is a plain Python program that can run in CPython
or a regular PyPy.  It can perform any virtualization it likes, by
giving the subprocess any custom view on its world.  For example, while
the subprocess thinks it's using file handles, in reality the numbers
are created by the controller process and so they need not be (and
probably should not be) real OS-level file handles at all.  In the demo
controller I've implemented there is simply a mapping from numbers to
file-like objects.  The controller answers to the "os_open" operation by
translating the requested path to some file or file-like object in some
virtual and completely custom directory hierarchy.  The file-like object
is put in the mapping with any unused number >= 3 as a key, and the
latter is returned to the subprocess.  The "os_read" operation works by
mapping the pseudo file handle given by the subprocess back to a
file-like object in the controller, and reading from the file-like

Enough explanations, here's a how-to:

For now, this lives in a branch at the 'pypy' level:
    cd ..../pypy-dist/pypy
    svn switch http://codespeak.net/svn/pypy/branch/pypy-more-rtti-inprogress/

In pypy/translator/goal:
    ./translate.py --sandbox --source targetpypystandalone.py --withoutmod-gc

    (the gc and _weakref modules are disabled because they are a bit
    too unsafe, in the sense that they could allow bogus memory

Then in the directory where the sources were created, compile with the
extra RPython-level assertions enabled:

    make CFLAGS="-O2 -DRPY_ASSERT"
    mv testing_1 /some/path/pypy-c-sandbox

Run it with the tools in the pypy/translator/sandbox directory:

    ./pypy_interact.py /some/path/pypy-c-sandbox [args...]

Just like pypy-c, if you pass no argument you get the interactive
prompt.  In theory it's impossible to do anything bad or read a random
file on the machine from this prompt.  (There is no protection against
using all the RAM or CPU yet.)  To pass a script as an argument you need
to put it in a directory along with all its dependencies, and ask
pypy_interact to export this directory (read-only) to the subprocess'
virtual /tmp directory with the --tmp=DIR option.

Not all operations are supported; e.g. if you type os.readlink('...'),
the controller crashes with an exception and the subprocess is killed.
Other operations make the subprocess die directly with a "Fatal RPython
error".  None of this is a security hole; it just means that if you try
to run some random program, it risks getting killed depending on the
Python built-in functions it tries to call.

By the way, as you should have realized, it's really independent from
the fact that it's PyPy that we are translating.  Any RPython program
should do.  I've successfully tried it on the JS interpreter.  The
controller is only called "pypy_interact" because it emulates a file
hierarchy that makes pypy-c-sandbox happy - it contains (read-only)
virtual directories like /bin/lib-python and /bin/pypy/lib and it
pretends that the executable is /bin/pypy-c.

A bientot,


