[Python-Dev] Sandboxed Threads in Python
Phillip J. Eby
pje at telecommunity.com
Sat Oct 8 02:51:41 CEST 2005
At 06:12 PM 10/7/2005 -0600, Adam Olsen wrote:
>Okay, basic principal first. You start with a sandboxed thread that
>has access to nothing. No modules, no builtins, *nothing*. This
>means it can run without the GIL but it can't do any work.
It sure can't. You need at least the threadstate and a builtins dictionary
to do any work.
> To make it
>do something useful we need to give it two things: first, immutable
>types that can be safely accessed without locks,
This is harder than it sounds. Integers, for example, have a custom
allocator and a free list, not to mention a small-integer cache. You would
somehow need to duplicate all that for each sandbox, or else you have to
make those integers immortal using your "magic constant".
>Turns out it's quite easy and it doesn't harm performance of existing
>code or require modification (but a recompile is necessary). The idea
>is to only use a cyclic garbage collector for cleaning them up,
Um, no, actually. You need a mark-and-sweep GC or something of that
ilk. Python's GC only works with objects that *have refcounts*, and it
works by clearing objects that are in cycles. The clearing causes
DECREF-ing, which then causes objects to be freed. If you have objects
without refcounts, they would be immortal and utterly unrecoverable.
>which
>means we need to disable the reference counting. That requires we
>modify Py_INCREF and Py_DECREF to be a no-op if ob_refcnt is set to a
>magic constant (probably a negative value).
And any object with the magic refcount will live *forever*, unless you
manually deallocate it.
>That's all it takes. Modify Py_INCREF and Py_DECREFs to check for a
>magic constant. Ahh, but the performance? See for yourself.
First, you need to implement a garbage collection scheme that can deal with
not having refcounts. Otherwise you're not comparing apples to apples
here, and your programs will leak like crazy.
Note that implementing a root-based GC for Python is non-trivial, since
extension modules can store pointers to PyObjects anywhere they
like. Further, many Python objects don't even support being tracked by the
current cycle collector.
So, changing this would probably require a lot of C extensions to be
rewritten to support the needed API changes for the new garbage collection
strategy.
>So to sum up, by prohibiting mutable objects from being transferred
>between sandboxes we can achieve scalability on multiple CPUs, making
>threaded programming easier and more reliable, as a bonus get secure
>sandboxes[1], and do that all while maintaining single-threaded
>performance and requiring minimal changes to existing C modules
>(recompiling).
Unfortunately, you have only succeeded in restating the problem, not
reducing its complexity. :) In fact, you may have increased the
complexity, since now you need a threadsafe garbage collector, too.
Oh, and don't forget - newstyle classes keep weak references to all their
subclasses, which means for example that every time you subclass 'dict',
you're modifying the "immutable" 'dict' class. So, unless you recreate all
the classes in each sandbox, you're back to needing locking. And if you
recreate everything in each sandbox, well, I think you've just reinvented
"processes". :)
More information about the Python-Dev
mailing list