[Python-Dev] The pysandbox project is broken

Tue Nov 12 22:16:55 CET 2013

Hi,

After having work during 3 years on a pysandbox project to sandbox
untrusted code, I now reached a point where I am convinced that
pysandbox is broken by design. Different developers tried to convinced
me before that pysandbox design is unsafe, but I had to experience it
myself to be convineced.

It would also be nice to help developers looking for a sandbox for
their application. Please tell me if you know sandbox projects for
Python so I can redirect users of pysandbox to a safer solution. I
already know PyPy sandbox.

I would like to share my experience because I know that other
developers are using sandboxes in production and that there is a real
need for sandboxing.

Origin of pysandbox
===================

In 2010, a developper called Tav wrote a sandbox called "safelite.py":
the sandbox hides sensitive attributes to separate a trusted namespace
and an untrusted namespace. Tav challenged Python core developers to
break his sandbox and... the sandbox was quickly broken. Even if it
was quickly broken, I was conviced that Tav found something
interesting and that there is a real need for sandboxing Python. I
continued his work by putting more protections on the untrusted
namespace. I published pysandbox 1.0 in june 2010.

History of pysandbox
====================

pysandbox was used to build an IRC bot on a french Python channel. The
bot executed Python code in the sandbox. The bot was mainly used by
hackers to test the sandbox to try to find a vulnerability. It was
nice to have such IRC bot on an Python help channel.

Three month later after the release of pysandbox 1.0, the first
vulnerability was found: it was possible to modify the __builtins__
dictionary to hack the sandbox functions and so escape from the
sandbox. I had to blacklist common instructions like "dict.pop()" or
"del dict[key]" to protect the __builtins__ dictionary. I had prefer
to use a custom type for __builtins__ but CPython requires a real
dictionary: Python/ceval.c has inlined version of PyDict_GetItem. For
your information, I modified CPython 3.3 to accept arbitrary mapping
types for __builtins__.

Just after this fix, another vulnerability was found: it was still
possible to modify __builtins__ using dict.__init__() method. The
access to this method was also blocked.

Seven months later, new vulnerabilities. The "timeout" protection was
removed because it is not effective on CPU intensive functions
implemented in C. And to workaround a known bug in CPython crashing
the interpreter, the access to the type.__bases__ attribute was also
blocked. But this protection has to be disabled on CPython 2.5 because
of another CPython bug... The access to func_defaults/__defaults__
attributes of a function was also blocked to protect the sandbox, even
if it was not exploitable to escape from the sandbox.

Recent events
==============

A few weeks ago, a security challenge targeted pysandbox. In less then
one day, two vulnerabilities were found. First, the compile() builtin
function was used to read line by line of an arbitrary file on the
disk using a syntax error: the line is displayed in the traceback.
Second, a context manager was used to retrieve a traceback object:
from traceback.tb_frame, it was possible to navigate in the frames
(using frame.f_back) to retrieve a frame of the trusted namespace, and
then use f_globals attribute of the frame to retrieve a global name.
Game over.

I fixed these two vulnerabilities in pysandbox 1.5.1: compile() is now
blocked by default, and the access to traceback.tb_frame, frame.f_back
and frame.f_globals has been blocked.

I also started to work on a new design of pysandbox (version currently
called "pysandbox 1.6", might become pysandbox 2.0 later): run
untrusted code in a subprocess to have a safer design. Using a
subprocess, it becomes easier to limit the memory usage, setup a real
timeout, limit bytes written to stdout, limit the size of data send to
and received from the child process, etc.  But my main motivation was
to not crash the whole application if the untrusted code exploits a
know Python bug to crash the process. They are (too) many ways to
crash Python using common types and functions...

The problem is that after each release it becomes harder to write
Python code in the sandbox. For example it becomes very hard to give
access to objects from the trusted namespace to the untrusted
namespace, because the whole object must be serialized to be passed to
the child process. It becomes also harder to debug bugs in the
sandboxeded code because the traceback feature doesn't work well in
the sandbox.

Pysandbox is broken
===================

In my opinion, the compile() vulnerabilty is the proof that it is not
possible to put a sandbox in CPython. Blocking access to the open()
builtin function and the file type constructor are not enough if
unrelated functions can give access indirectly to the file system.
Having read access on the file system is a critical vulnerability in
pysandbox and modifying CPython to not print the source code line in a
traceback is also not acceptable.

I now agree that putting a sandbox in CPython is the wrong design.
There are too many ways to escape the untrusted namespace using the
various introspection features of the Python language. To guarantee
the safetely of a security product, the code should be carefuly
audited and the code to review must be as small as possible. Using
pysandbox, the "code" is the whole Python core which is a really huge
code base. For example, the Python and Objects directories of Python
3.4 contain more than 126,000 lines of C code.

The security of pysandbox is the security of its weakest part. A
single bug is enough to escape the whole sandbox.

Attackers had original and different ideas like hacking __builtins__,
using warnings, context manager, syntax errors, arbitrary bytecode,
etc. It is hard to protect the untrusted namespace for all these
different Python features.

It might be possible to invest a lot of time to put enough protections
to protect the untrusted namespace, but it leads to my second point:
pysandbox cannot be used in practice.

pysandbox cannot be used in practice
====================================

To protect the untrusted namespace, pysandbox installs a lot of
different protections. Because of all these protections, it becomes
hard to write Python code. Basic features like "del dict[key]" are
denied. Passing an object to a sandbox is not possible to sandbox,
pysandbox is unable to proxify arbitary objects.

For something more complex than evaluating "1+(2*3)", pysandbox cannot
be used in practice, because of all these protections. Individual
protections cannot be disabled, all protections are required to get a
secure sandbox.

So what should be used to sandbox Python?
=========================================

I developed pysandbox for fun in my free time. But I was contacted by
different companies interested to use pysandbox in production on their
web application.  So I think that there is a real need to execute
arbitrary untrusted code.

I now think that putting a sandbox directly in Python cannot be
secure. To build a secure sandbox, the whole Python process must be
put in an external sandbox. There are for example projects using Linux
SECCOMP security feature to isolate the Python process.

PyPy has a similar design, it implemented something similar to SECCOMP
but in a portable way.

Please tell me if you know sandbox projects for Python so I can
redirect users of pysandbox to a safer solution. I already know PyPy
sandbox.

Victor