Okay, basic principal first. You start with a sandboxed thread that has access to nothing. No modules, no builtins, *nothing*. This means it can run without the GIL but it can't do any work. To make it do something useful we need to give it two things: first, immutable types that can be safely accessed without locks, and second a thread-safe queue to coordinate. With those you can bring modules and builtins back into the picture, either by making them immutable or using a proxy that handles all the methods in a single thread.
Unfortunately python has a problem with immutable types. For the most part it uses an honor system, trusting programmers not to make a class that claims to be immutable yet changes state anyway. We need more than that, and "freezing" a dict would work well enough, so it's not the problem. The problem is the reference counting, and even if we do it "safely" all the memory writes just kill performance so we need to avoid it completely.
Turns out it's quite easy and it doesn't harm performance of existing code or require modification (but a recompile is necessary). The idea is to only use a cyclic garbage collector for cleaning them up, which means we need to disable the reference counting. That requires we modify Py_INCREF and Py_DECREF to be a no-op if ob_refcnt is set to a magic constant (probably a negative value).
That's all it takes. Modify Py_INCREF and Py_DECREFs to check for a magic constant. Ahh, but the performance? See for yourself.
Normal Py_INCREF/Py_DECREF rhamph@factor:~/src/Python-2.4.1$ ./python Lib/test/pystone.py 500000 Pystone(1.1) time for 500000 passes = 13.34 This machine benchmarks at 37481.3 pystones/second
Modified Py_INCREF/Py_DECREF with magic constant rhamph@factor:~/src/Python-2.4.1-sandbox$ ./python Lib/test/pystone.py 500000 Pystone(1.1) time for 500000 passes = 13.38 This machine benchmarks at 37369.2 pystones/second
The numbers aren't significantly different. In fact the second one is often slightly faster, which shows the difference is smaller than the statistical noise.
So to sum up, by prohibiting mutable objects from being transferred between sandboxes we can achieve scalability on multiple CPUs, making threaded programming easier and more reliable, as a bonus get secure sandboxes, and do that all while maintaining single-threaded performance and requiring minimal changes to existing C modules (recompiling).
A "proof of concept" patch to Py_INCREF/Py_DECREF (only demonstrates performance effects, does not create or utilize any new functionality) can be found here: https://sourceforge.net/tracker/index.php?func=detail&aid=1316653&gr...
 We need to remove any backdoor methods of getting to mutable objects outside of your sandbox, which gets us most of the way towards a restricted execution environment.
-- Adam Olsen, aka Rhamphoryncus