[Python-3000] Will we have a true restricted exec environment for python-3000?

Fri Apr 14 10:20:00 CEST 2006

Ian Bicking wrote:
> One advantage to this is that at each step something useful would be 
> created.  Better IPC would be useful for more than restricted execution. 
>  Or ways to create subinterpreters and communicate with them.  Or making 
> the interpreter startup faster, to facilitate spawning short-lived 
> interpreters.  As an example, by combining these features but leaving 
> out restricted execution you can get something like the PHP model, which 
> manages to protect well against buggy code (even if not so well against 
> security bugs).  There's probably a dozen distinct parts to this, but I 
> think each is independently interesting and useful.
> 
> And maybe, because these are useful subprojects, some of the people who 
> can't commit to the time to take on this project as a whole could commit 
> to some piece of it for which they have alternate motivations.

This is why I think the first step in a solid Python restricted execution 
framework isn't an implementation activity but a research/scoping activity, 
looking at the various systems already out there, and the various trade-offs 
involved in each.

The requirements for a subinterpreter are completely different from those for 
the main Python interpreter which actually runs a Python application - the 
initial state of a subinterpreter should be controlled entirely by the 
invoking interpreter, so much of the normal interpreter startup sequence 
should be eliminated. Subinterpreters are designed to manipulate the state of 
the main application rather than the platform they are executing on, so direct 
access to the underlying OS isn't needed. The meaning of module imports in the 
subinterpreter is also up to the main application.

My initial inclination is that aiming straight for in-process restricted 
execution is premature optimisation at its worst - worrying about IPC overhead 
before there's even anything on the table that is functionally complete (i.e. 
both usable and demonstrably secure) seems to be putting the cart before the 
horse. The liberal use of C static variables by both the interpreter core and 
third party extensions means that in-process restricted execution will be 
necessarily invasive - every static variable will need to be checked for 
security implications, and some mechanism provided to support partitioning 
between different interpreters in the same process. No C extensions could ever 
be trusted, as they would have full access to the main interpreter's C API.

Initially targeting an out-of-process sandbox allows all those issues to be 
dealt with later, rather than having them be blockers for the initial 
implementation of a restricted execution mechanism that is actually able to 
keep its promises. If the OS provides sufficient support (such as chroot 
jails) then it may even be possible to provide secure execution of C extension 
modules.

An out-of-process sandbox frees up the technology options - it may turn out to 
make more sense to base a restricted interpreter on PyPy rather than CPython. 
It also allows a strategy similar to that used by Lua - start the 
subinterpreter with a stripped core that can't read or write files directly at 
all, and don't do anything on startup unless explicitly requested by the main 
interpreter.

If we can get something that's secure when operating in a different process, 
*then* thread-local storage may permit that subinterpreter to be migrated from 
an out-of-process object to a separate OS-level thread inside the main 
process. The subinterpreter would still be entirely independent of the main 
interpreter's state, and main interpreter objects would still need to be 
converted to subinterpreter objects in order for code running in the 
subinterpreter to manipulate them (and vice versa) but any OS-level IPC 
overhead would be gone.

Making such subinterpreters easy to create is also, IMO, the best way to deal 
with criticism of the GIL - as such interpreters would have their own state, 
they'd be free to run on as many different processors as you wanted. The only 
time they'd block on the GIL is when they needed to send information to, or 
receive information from, the main interpreter.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org