[Python-3000] Will we have a true restricted exec environment for python-3000?
Nick Coghlan
ncoghlan at gmail.com
Fri Apr 14 10:20:00 CEST 2006
Ian Bicking wrote:
> One advantage to this is that at each step something useful would be
> created. Better IPC would be useful for more than restricted execution.
> Or ways to create subinterpreters and communicate with them. Or making
> the interpreter startup faster, to facilitate spawning short-lived
> interpreters. As an example, by combining these features but leaving
> out restricted execution you can get something like the PHP model, which
> manages to protect well against buggy code (even if not so well against
> security bugs). There's probably a dozen distinct parts to this, but I
> think each is independently interesting and useful.
>
> And maybe, because these are useful subprojects, some of the people who
> can't commit to the time to take on this project as a whole could commit
> to some piece of it for which they have alternate motivations.
This is why I think the first step in a solid Python restricted execution
framework isn't an implementation activity but a research/scoping activity,
looking at the various systems already out there, and the various trade-offs
involved in each.
The requirements for a subinterpreter are completely different from those for
the main Python interpreter which actually runs a Python application - the
initial state of a subinterpreter should be controlled entirely by the
invoking interpreter, so much of the normal interpreter startup sequence
should be eliminated. Subinterpreters are designed to manipulate the state of
the main application rather than the platform they are executing on, so direct
access to the underlying OS isn't needed. The meaning of module imports in the
subinterpreter is also up to the main application.
My initial inclination is that aiming straight for in-process restricted
execution is premature optimisation at its worst - worrying about IPC overhead
before there's even anything on the table that is functionally complete (i.e.
both usable and demonstrably secure) seems to be putting the cart before the
horse. The liberal use of C static variables by both the interpreter core and
third party extensions means that in-process restricted execution will be
necessarily invasive - every static variable will need to be checked for
security implications, and some mechanism provided to support partitioning
between different interpreters in the same process. No C extensions could ever
be trusted, as they would have full access to the main interpreter's C API.
Initially targeting an out-of-process sandbox allows all those issues to be
dealt with later, rather than having them be blockers for the initial
implementation of a restricted execution mechanism that is actually able to
keep its promises. If the OS provides sufficient support (such as chroot
jails) then it may even be possible to provide secure execution of C extension
modules.
An out-of-process sandbox frees up the technology options - it may turn out to
make more sense to base a restricted interpreter on PyPy rather than CPython.
It also allows a strategy similar to that used by Lua - start the
subinterpreter with a stripped core that can't read or write files directly at
all, and don't do anything on startup unless explicitly requested by the main
interpreter.
If we can get something that's secure when operating in a different process,
*then* thread-local storage may permit that subinterpreter to be migrated from
an out-of-process object to a separate OS-level thread inside the main
process. The subinterpreter would still be entirely independent of the main
interpreter's state, and main interpreter objects would still need to be
converted to subinterpreter objects in order for code running in the
subinterpreter to manipulate them (and vice versa) but any OS-level IPC
overhead would be gone.
Making such subinterpreters easy to create is also, IMO, the best way to deal
with criticism of the GIL - as such interpreters would have their own state,
they'd be free to run on as many different processors as you wanted. The only
time they'd block on the GIL is when they needed to send information to, or
receive information from, the main interpreter.
Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
---------------------------------------------------------------
http://www.boredomandlaziness.org
More information about the Python-3000
mailing list