[Python-Dev] Sandboxed Threads in Python

Adam Olsen rhamph at gmail.com
Sat Oct 8 02:12:31 CEST 2005


Okay, basic principal first.  You start with a sandboxed thread that
has access to nothing.  No modules, no builtins, *nothing*.  This
means it can run without the GIL but it can't do any work.  To make it
do something useful we need to give it two things: first, immutable
types that can be safely accessed without locks, and second a
thread-safe queue to coordinate.  With those you can bring modules and
builtins back into the picture, either by making them immutable or
using a proxy that handles all the methods in a single thread.

Unfortunately python has a problem with immutable types.  For the most
part it uses an honor system, trusting programmers not to make a class
that claims to be immutable yet changes state anyway.  We need more
than that, and "freezing" a dict would work well enough, so it's not
the problem.  The problem is the reference counting, and even if we do
it "safely" all the memory writes just kill performance so we need to
avoid it completely.

Turns out it's quite easy and it doesn't harm performance of existing
code or require modification (but a recompile is necessary).  The idea
is to only use a cyclic garbage collector for cleaning them up, which
means we need to disable the reference counting.  That requires we
modify Py_INCREF and Py_DECREF to be a no-op if ob_refcnt is set to a
magic constant (probably a negative value).

That's all it takes.  Modify Py_INCREF and Py_DECREFs to check for a
magic constant.  Ahh, but the performance?  See for yourself.

Normal Py_INCREF/Py_DECREF
rhamph at factor:~/src/Python-2.4.1$ ./python Lib/test/pystone.py 500000
Pystone(1.1) time for 500000 passes = 13.34
This machine benchmarks at 37481.3 pystones/second

Modified Py_INCREF/Py_DECREF with magic constant
rhamph at factor:~/src/Python-2.4.1-sandbox$ ./python Lib/test/pystone.py 500000
Pystone(1.1) time for 500000 passes = 13.38
This machine benchmarks at 37369.2 pystones/second

The numbers aren't significantly different.  In fact the second one is
often slightly faster, which shows the difference is smaller than the
statistical noise.

So to sum up, by prohibiting mutable objects from being transferred
between sandboxes we can achieve scalability on multiple CPUs, making
threaded programming easier and more reliable, as a bonus get secure
sandboxes[1], and do that all while maintaining single-threaded
performance and requiring minimal changes to existing C modules
(recompiling).

A "proof of concept" patch to Py_INCREF/Py_DECREF (only demonstrates
performance effects, does not create or utilize any new functionality)
can be found here:
https://sourceforge.net/tracker/index.php?func=detail&aid=1316653&group_id=5470&atid=305470

[1] We need to remove any backdoor methods of getting to mutable
objects outside of your sandbox, which gets us most of the way towards
a restricted execution environment.

--
Adam Olsen, aka Rhamphoryncus


More information about the Python-Dev mailing list