[Python-Dev] The pysandbox project is broken

Sat Nov 16 17:45:00 CET 2013

On Sat, Nov 16, 2013 at 02:53:22AM -0800, Maciej Fijalkowski wrote:
> On Fri, Nov 15, 2013 at 6:56 PM, Trent Nelson <trent at snakebite.org> wrote:
> > On Tue, Nov 12, 2013 at 01:16:55PM -0800, Victor Stinner wrote:
> >> pysandbox cannot be used in practice
> >> ====================================
> >>
> >> To protect the untrusted namespace, pysandbox installs a lot of
> >> different protections. Because of all these protections, it becomes
> >> hard to write Python code. Basic features like "del dict[key]" are
> >> denied. Passing an object to a sandbox is not possible to sandbox,
> >> pysandbox is unable to proxify arbitary objects.
> >>
> >> For something more complex than evaluating "1+(2*3)", pysandbox cannot
> >> be used in practice, because of all these protections. Individual
> >> protections cannot be disabled, all protections are required to get a
> >> secure sandbox.
> >
> >     This sounds a lot like the work I initially did with PyParallel to
> >     try and intercept/prevent parallel threads mutating main-thread
> >     objects.
> >
> >     I ended up arriving at a much better solution by just relying on
> >     memory protection; main thread pages are set read-only prior to
> >     parallel threads being able to run.  If a parallel thread attempts
> >     to mutate a main thread object; a SEH is raised (SIGSEV on POSIX),
> >     which I catch in the ceval loop and convert into an exception.
> >
> >     See slide 138 of this: https://speakerdeck.com/trent/pyparallel-how-we-removed-the-gil-and-exploited-all-cores-1
> >
> >     I'm wondering if this sort of an approach (which worked surprisingly
> >     well) could be leveraged to also provide a sandbox environment?  The
> >     goals are the same: robust protection against mutation of memory
> >     allocated outside of the sandbox.
> >
> >     (I'm purely talking about memory mutation; haven't thought about how
> >      that could be extended to prevent file system interaction as well.)
> >
> >
> >         Trent.
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev at python.org
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe: https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
> 
> Trent, you should read the mail more carefully. Notably the same
> issues that make it impossible to create a sandbox make it impossible
> to create pyparaller really work. Being read-only is absolutely not
> enough - you can read some internal structures in inconsistent state
> that lead to crashes and/or very unexpected behavior even without
> modifying anything.

    What do you mean by inconsistent state?  Like a dict half way
    through `a['foo'] = 'bar'`?  That can't happen with PyParallel;
    parallel threads don't run when the main thread runs and vice
    versa.  The main thread's memory (and internal object structure)
    will always be consistent by the time the parallel threads run.

> PS. We really did a lot of work analyzing how STM-pypy can lead to
> conflicts and/or inconsistent behavior.

    But you support free-threading though, right?  As in, code that
    subclasses threading.Thread should be able to benefit from your
    STM work?

    I explicitly don't support free-threading.  Your threading.Thread
    code will not magically run faster with PyParallel.  You'll need
    to re-write your code using the parallel and async façade APIs I
    expose.

    On the plus side, I can completely control everything about the
    main thread and parallel thread execution environments; obviating
    the need to protect against internal inconsistencies by virtue of
    the fact that the main thread will always be in a consistent state
    when the parallel threads are running.

    (And it works really well in practice; I ported SimpleHTTPServer to
    use my new async stuff and it flies -- it'll automatically exploit
    all your cores if there is sufficient incoming load.  Unexpected
    side-effect of my implementation is that code executing in parallel
    callbacks actually runs faster than normal single-threaded Python
    code; no need to do reference counting, GC, and the memory model is
    ridiculously cache and TLB friendly.)

    This is getting off-topic though and I don't want to hijack the
    sandbox thread.  I was planning on sending an e-mail in a few days
    when the PyData video of my talk is live -- we can debate the merits
    of my parallel/async approach then :-)

> Cheers,
> fijal

        Trent.