[Python-Dev] Safe to change a thread's interpreter?

Mon Aug 2 05:52:52 CEST 2004

Recently I've been researching implementation strategies for adding Java 
classloader-like capabilities to Python.  I was pleasantly surprised to 
find out that CPython already supports multiple interpreters via the C API, 
where each "interpreter" includes fresh versions of 'sys', '__builtin__', etc.

The C API doc for PyInterpreter_New(), however, says:

"""It is possible to insert objects created in one sub-interpreter into a 
namespace of another sub-interpreter; this should be done with great care 
to avoid sharing user-defined functions, methods, instances or classes 
between sub-interpreters, since import operations executed by such objects 
may affect the wrong (sub-)interpreter's dictionary of loaded modules. (XXX 
This is a hard-to-fix bug that will be addressed in a future release.)"""

It seems to me that the bug described could be fixed (or at least worked 
around) by having __import__ temporarily change the 'interp' field of the 
current thread state to point to the interpreter that the __import__ 
function lives in.  Then, at the end of the __import__, reset the 'interp' 
field back to its original value.  (Of course, it would also have to fix up 
the linked lists of the interpreters' thread states during each swap, but 
that shouldn't be too difficult.)

My question is: does this make sense, or am I completely out in left field 
here?  The only thing I can think of that this would affect is the 
'threading' module, in that trying to get the current thread from there 
(during such an import) might see a foreign interpreter's thread as its 
own.  But, I'm hard-pressed to think of any damage that could possibly 
cause.  Indeed, it seems to me that Python itself doesn't really care how 
many interpreters or thread states there are running around, and that it 
only has the linked lists to support "advanced debuggers".

Even if it's undesirable to fix the problem this way in the Python core, 
would it be acceptable to do so in an extension module?

What I have in mind is to create an extension module that wraps 
Py_InterpreterState/Py_ThreadState objects up in a subclassable extension 
type, designed to ensure the integrity of Python as a whole, while still 
allowing various import-related methods to be overridden in order to 
implement Java-style classloader hierarchies.  So, you might do something like:

     from interpreter import Interpreter

     # Run 'somescript in its own interpreter.
     it = Interpreter()
     exit_code = it.run_main("somescript.py")

     # Release resources without waiting for GC
     it.close()

My thought here also is that performing operations such as running code in 
a given Interpreter would also operate by swapping the thread state's 
'interp' field.  Thus, exceptions in the child interpreter would be 
seamlessly carried through to the parent interpreter.

In order to implement the full Java classloader model, it would also be 
necessary to be able to force imports *not* to use the Interpreter that the 
code doing the import came from.  (i.e. the equivalent of using 
'java.lang.Thread.setContextClassLoader()').  This can also probably be 
implemented via a thread-local variable in the 'interpreter' module.

So...  must a thread state always reference the same interpreter 
object?  If not, then I think I see a way to safely implement access to 
multiple interpreters from within Python itself.