[Python-Dev] Sandboxed Threads in Python

Phillip J. Eby pje at telecommunity.com
Sat Oct 8 02:51:41 CEST 2005


At 06:12 PM 10/7/2005 -0600, Adam Olsen wrote:
>Okay, basic principal first.  You start with a sandboxed thread that
>has access to nothing.  No modules, no builtins, *nothing*.  This
>means it can run without the GIL but it can't do any work.

It sure can't.  You need at least the threadstate and a builtins dictionary 
to do any work.


>   To make it
>do something useful we need to give it two things: first, immutable
>types that can be safely accessed without locks,

This is harder than it sounds.  Integers, for example, have a custom 
allocator and a free list, not to mention a small-integer cache.  You would 
somehow need to duplicate all that for each sandbox, or else you have to 
make those integers immortal using your "magic constant".


>Turns out it's quite easy and it doesn't harm performance of existing
>code or require modification (but a recompile is necessary).  The idea
>is to only use a cyclic garbage collector for cleaning them up,

Um, no, actually.  You need a mark-and-sweep GC or something of that 
ilk.  Python's GC only works with objects that *have refcounts*, and it 
works by clearing objects that are in cycles.  The clearing causes 
DECREF-ing, which then causes objects to be freed.  If you have objects 
without refcounts, they would be immortal and utterly unrecoverable.


>which
>means we need to disable the reference counting.  That requires we
>modify Py_INCREF and Py_DECREF to be a no-op if ob_refcnt is set to a
>magic constant (probably a negative value).

And any object with the magic refcount will live *forever*, unless you 
manually deallocate it.



>That's all it takes.  Modify Py_INCREF and Py_DECREFs to check for a
>magic constant.  Ahh, but the performance?  See for yourself.

First, you need to implement a garbage collection scheme that can deal with 
not having refcounts.  Otherwise you're not comparing apples to apples 
here, and your programs will leak like crazy.

Note that implementing a root-based GC for Python is non-trivial, since 
extension modules can store pointers to PyObjects anywhere they 
like.  Further, many Python objects don't even support being tracked by the 
current cycle collector.

So, changing this would probably require a lot of C extensions to be 
rewritten to support the needed API changes for the new garbage collection 
strategy.


>So to sum up, by prohibiting mutable objects from being transferred
>between sandboxes we can achieve scalability on multiple CPUs, making
>threaded programming easier and more reliable, as a bonus get secure
>sandboxes[1], and do that all while maintaining single-threaded
>performance and requiring minimal changes to existing C modules
>(recompiling).

Unfortunately, you have only succeeded in restating the problem, not 
reducing its complexity.  :)  In fact, you may have increased the 
complexity, since now you need a threadsafe garbage collector, too.

Oh, and don't forget - newstyle classes keep weak references to all their 
subclasses, which means for example that every time you subclass 'dict', 
you're modifying the "immutable" 'dict' class.  So, unless you recreate all 
the classes in each sandbox, you're back to needing locking.  And if you 
recreate everything in each sandbox, well, I think you've just reinvented 
"processes".  :)



More information about the Python-Dev mailing list