[pypy-svn] r47116 - pypy/dist/pypy/doc

cfbolz at codespeak.net cfbolz at codespeak.net
Wed Oct 3 19:34:29 CEST 2007

Author: cfbolz
Date: Wed Oct  3 19:34:28 2007
New Revision: 47116

condense the various mails on pypy-dev into some documentation about the
sandbox features.

Modified: pypy/dist/pypy/doc/index.txt
--- pypy/dist/pypy/doc/index.txt	(original)
+++ pypy/dist/pypy/doc/index.txt	Wed Oct  3 19:34:28 2007
@@ -24,6 +24,7 @@
   * `What PyPy can do for your objects`_
   * `Stackless and coroutines`_
   * `JIT Generation in PyPy`_ 
+  * `Sandboxing Python code`_
 `extension compiler`_ describes the (in-progress) tool that can be used
 to write modules in PyPy's style and compile them into regular CPython
@@ -177,6 +178,7 @@
 .. _`Nightly builds and benchmarks`: http://tuatara.cs.uni-duesseldorf.de/benchmark.html
 .. _`directory reference`: 
 .. _`rlib`: rlib.html
+.. _`Sandboxing Python code`: sandbox.html
 PyPy directory cross-reference 

Added: pypy/dist/pypy/doc/sandbox.txt
--- (empty file)
+++ pypy/dist/pypy/doc/sandbox.txt	Wed Oct  3 19:34:28 2007
@@ -0,0 +1,88 @@
+PyPy's sandboxing features
+One of PyPy's translation aspect is a sandboxing feature. It's "sandboxing" as
+in "full virtualization", but done in normal C with no OS support at all.  It's
+a two-processes model: we can translate PyPy to a special "pypy-c-sandbox"
+executable, which is safe in the sense that it doesn't do any library or system
+call - instead, whenever it would like to perform such an operation, it
+marshals the operation name and the arguments to its stdout and it waits for
+the marshalled result on its stdin.  This pypy-c-sandbox process is meant to be
+run by an outer "controller" program that answers to these operation requests.
+The pypy-c-sandbox program is obtained by adding a transformation during
+translation, which turns all RPython-level external function calls into
+stubs that do the marshalling/waiting/unmarshalling.  An attacker that
+tries to escape the sandbox is stuck within a C program that contains no
+external function call at all except to write to stdout and read from
+stdin.  (It's still attackable, e.g. by exploiting segfault-like
+situations, but as far as I can tell - unlike CPython - any RPython
+program is really robust against this kind of attack, at least if we
+enable the extra checks that all RPython list and string indexing are in
+range.  Alternatively, on Linux there is a lightweight OS-level
+sandboxing technique available by default - google for 'seccomp'.)
+The outer controller is a plain Python program that can run in CPython
+or a regular PyPy.  It can perform any virtualization it likes, by
+giving the subprocess any custom view on its world.  For example, while
+the subprocess thinks it's using file handles, in reality the numbers
+are created by the controller process and so they need not be (and
+probably should not be) real OS-level file handles at all.  In the demo
+controller I've implemented there is simply a mapping from numbers to
+file-like objects.  The controller answers to the "os_open" operation by
+translating the requested path to some file or file-like object in some
+virtual and completely custom directory hierarchy.  The file-like object
+is put in the mapping with any unused number >= 3 as a key, and the
+latter is returned to the subprocess.  The "os_read" operation works by
+mapping the pseudo file handle given by the subprocess back to a
+file-like object in the controller, and reading from the file-like
+Translating an RPython program with sandboxing enables also uses a special flag
+that enables all sorts of C-level assertions against index-out-of-bounds
+By the way, as you should have realized, it's really independent from
+the fact that it's PyPy that we are translating.  Any RPython program
+should do.  I've successfully tried it on the JS interpreter.  The
+controller is only called "pypy_interact" because it emulates a file
+hierarchy that makes pypy-c-sandbox happy - it contains (read-only)
+virtual directories like /bin/lib-python and /bin/pypy/lib and it
+pretends that the executable is /bin/pypy-c.
+In pypy/translator/goal::
+   ./translate.py --sandbox --source targetpypystandalone.py --withoutmod-gc
+                  --withoutmod-_weakref
+   (the gc and _weakref modules are disabled because they are a bit
+   too unsafe, in the sense that they could allow bogus memory
+   accesses)
+To run it, use the tools in the pypy/translator/sandbox directory::
+   ./pypy_interact.py /some/path/pypy-c-sandbox [args...]
+Just like pypy-c, if you pass no argument you get the interactive
+prompt.  In theory it's impossible to do anything bad or read a random
+file on the machine from this prompt.  (There is no protection against
+using all the RAM or CPU yet.)  To pass a script as an argument you need
+to put it in a directory along with all its dependencies, and ask
+pypy_interact to export this directory (read-only) to the subprocess'
+virtual /tmp directory with the --tmp=DIR option.
+Not all operations are supported; e.g. if you type os.readlink('...'),
+the controller crashes with an exception and the subprocess is killed.
+Other operations make the subprocess die directly with a "Fatal RPython
+error".  None of this is a security hole; it just means that if you try
+to run some random program, it risks getting killed depending on the
+Python built-in functions it tries to call.

More information about the Pypy-commit mailing list