[Python-Dev] Safe to change a thread's interpreter?
Phillip J. Eby
pje at telecommunity.com
Mon Aug 2 05:52:52 CEST 2004
Recently I've been researching implementation strategies for adding Java
classloader-like capabilities to Python. I was pleasantly surprised to
find out that CPython already supports multiple interpreters via the C API,
where each "interpreter" includes fresh versions of 'sys', '__builtin__', etc.
The C API doc for PyInterpreter_New(), however, says:
"""It is possible to insert objects created in one sub-interpreter into a
namespace of another sub-interpreter; this should be done with great care
to avoid sharing user-defined functions, methods, instances or classes
between sub-interpreters, since import operations executed by such objects
may affect the wrong (sub-)interpreter's dictionary of loaded modules. (XXX
This is a hard-to-fix bug that will be addressed in a future release.)"""
It seems to me that the bug described could be fixed (or at least worked
around) by having __import__ temporarily change the 'interp' field of the
current thread state to point to the interpreter that the __import__
function lives in. Then, at the end of the __import__, reset the 'interp'
field back to its original value. (Of course, it would also have to fix up
the linked lists of the interpreters' thread states during each swap, but
that shouldn't be too difficult.)
My question is: does this make sense, or am I completely out in left field
here? The only thing I can think of that this would affect is the
'threading' module, in that trying to get the current thread from there
(during such an import) might see a foreign interpreter's thread as its
own. But, I'm hard-pressed to think of any damage that could possibly
cause. Indeed, it seems to me that Python itself doesn't really care how
many interpreters or thread states there are running around, and that it
only has the linked lists to support "advanced debuggers".
Even if it's undesirable to fix the problem this way in the Python core,
would it be acceptable to do so in an extension module?
What I have in mind is to create an extension module that wraps
Py_InterpreterState/Py_ThreadState objects up in a subclassable extension
type, designed to ensure the integrity of Python as a whole, while still
allowing various import-related methods to be overridden in order to
implement Java-style classloader hierarchies. So, you might do something like:
from interpreter import Interpreter
# Run 'somescript in its own interpreter.
it = Interpreter()
exit_code = it.run_main("somescript.py")
# Release resources without waiting for GC
it.close()
My thought here also is that performing operations such as running code in
a given Interpreter would also operate by swapping the thread state's
'interp' field. Thus, exceptions in the child interpreter would be
seamlessly carried through to the parent interpreter.
In order to implement the full Java classloader model, it would also be
necessary to be able to force imports *not* to use the Interpreter that the
code doing the import came from. (i.e. the equivalent of using
'java.lang.Thread.setContextClassLoader()'). This can also probably be
implemented via a thread-local variable in the 'interpreter' module.
So... must a thread state always reference the same interpreter
object? If not, then I think I see a way to safely implement access to
multiple interpreters from within Python itself.
More information about the Python-Dev
mailing list