[Python-Dev] Sandboxed Threads in Python

Adam Olsen rhamph at gmail.com
Sat Oct 8 03:17:01 CEST 2005


On 10/7/05, Phillip J. Eby <pje at telecommunity.com> wrote:
> At 06:12 PM 10/7/2005 -0600, Adam Olsen wrote:
> >Okay, basic principal first.  You start with a sandboxed thread that
> >has access to nothing.  No modules, no builtins, *nothing*.  This
> >means it can run without the GIL but it can't do any work.
>
> It sure can't.  You need at least the threadstate and a builtins dictionary
> to do any work.
>
>
> >   To make it
> >do something useful we need to give it two things: first, immutable
> >types that can be safely accessed without locks,
>
> This is harder than it sounds.  Integers, for example, have a custom
> allocator and a free list, not to mention a small-integer cache.  You would
> somehow need to duplicate all that for each sandbox, or else you have to
> make those integers immortal using your "magic constant".

Yes, we'd probably want some per-sandbox allocators.  I'm no expert on
that but I know it can be done.


> >Turns out it's quite easy and it doesn't harm performance of existing
> >code or require modification (but a recompile is necessary).  The idea
> >is to only use a cyclic garbage collector for cleaning them up,
>
> Um, no, actually.  You need a mark-and-sweep GC or something of that
> ilk.  Python's GC only works with objects that *have refcounts*, and it
> works by clearing objects that are in cycles.  The clearing causes
> DECREF-ing, which then causes objects to be freed.  If you have objects
> without refcounts, they would be immortal and utterly unrecoverable.

Perhaps I wasn't clear enough, I was assuming appropriate changes to
the GC would be done.  The important thing is it can be done without
changing the interface that the existing modules use.


> >which
> >means we need to disable the reference counting.  That requires we
> >modify Py_INCREF and Py_DECREF to be a no-op if ob_refcnt is set to a
> >magic constant (probably a negative value).
>
> And any object with the magic refcount will live *forever*, unless you
> manually deallocate it.

See above.


> >That's all it takes.  Modify Py_INCREF and Py_DECREFs to check for a
> >magic constant.  Ahh, but the performance?  See for yourself.
>
> First, you need to implement a garbage collection scheme that can deal with
> not having refcounts.  Otherwise you're not comparing apples to apples
> here, and your programs will leak like crazy.
>
> Note that implementing a root-based GC for Python is non-trivial, since
> extension modules can store pointers to PyObjects anywhere they
> like.  Further, many Python objects don't even support being tracked by the
> current cycle collector.
>
> So, changing this would probably require a lot of C extensions to be
> rewritten to support the needed API changes for the new garbage collection
> strategy.

They only need to be rewritten if you want them to provide an
immutable type that can be transferred between sandboxes.  Short of
that you can make the module object itself immutable, and from it
create mutable instances that are private to each sandbox and not
sharable.

If you make no changes at all the module still works, but is only
usable from the main thread.  That allows us to transition
incrementally.


> >So to sum up, by prohibiting mutable objects from being transferred
> >between sandboxes we can achieve scalability on multiple CPUs, making
> >threaded programming easier and more reliable, as a bonus get secure
> >sandboxes[1], and do that all while maintaining single-threaded
> >performance and requiring minimal changes to existing C modules
> >(recompiling).
>
> Unfortunately, you have only succeeded in restating the problem, not
> reducing its complexity.  :)  In fact, you may have increased the
> complexity, since now you need a threadsafe garbage collector, too.
>
> Oh, and don't forget - newstyle classes keep weak references to all their
> subclasses, which means for example that every time you subclass 'dict',
> you're modifying the "immutable" 'dict' class.  So, unless you recreate all
> the classes in each sandbox, you're back to needing locking.  And if you
> recreate everything in each sandbox, well, I think you've just reinvented
> "processes".  :)

I was aware that weakrefs needed some special handling (I just forgot
to mention it), but I didn't know it was used by subclassing. 
Unfortunately I don't know what purpose it serves so I can't
contemplate how to deal with it.

I need to stress that *only* the new, immutable and "thread-safe
mark-and-sweep" types would be affected by these changes.  Everything
else would continue to exist as it did before, and the benchmark
exists to show they can coexist without killing performance.

--
Adam Olsen, aka Rhamphoryncus


More information about the Python-Dev mailing list