[Python-ideas] Parallel processing with Python

Sturla Molden sturla at molden.no
Thu Feb 19 00:34:04 CET 2009

About a year ago, I posted a scheme to comp.lang.python describing how to
use isolated interpreters to circumvent the GIL on SMPs:


In the following, an "appdomain" will be defined as a thread assosciated
with an unique embedded Python interpreter. One interpreter per thread is
how tcl work. Erlang also uses isolated threads that only communicate
through messages (as opposed to shared objects). Appdomains are also
available in the .NET framework, and in Java as "Java isolates". They are
potentially very useful as multicore CPUs become abundant. They allow one
process to run one independent Python interpreter on each available CPU

In Python, "appdomains" can be created by embedding the Python interpreter
multiple times in a process. For this to work, we have to make multiple
copies of the Python DLL and rename them (e.g. Python25-0.dll,
Python25-1.dll,  Python25-2.dll, etc.) Otherwise the dynamic loader will
just return a handle to the already imported DLL. As DLLs can be accessed
with ctypes, we don't even have to program a line of C to do this. we can
start up a Python interpreter and use ctypes to embed more interpreters
into it, associating each interpreter with its own thread. ctypes takes
care of releasing the GIL in the parent interpreter, so calls to these
sub-interpreters become asynchronous. I had a mock-up of this scheme
working. Martin Löwis replied he doubted this would work, and pointed out
that Python extension libraries (.pyd files) are DLLs as well. They would
only be imported once, and their global states would thus crash, thus
producing havoc:


He was right, of course, but also wrong. In fact I had already proven him
wrong by importing a DLL multiple times. If it can be done for
Python25.dll, it can be done for any other DLL as well - including .pyd
files - in exactly the same way. Thus what remains is to change Python's
dynamic loader to use the same "copy and import" scheme. This can either
be done by changing Python's C code, or (at least on Windows) to redirect
the LoadLibrary API call from kernel32.dll to a custom DLL. Both a quite
easy and requires minimal C coding.

Thus it is quite easy to make multiple, independent Python interpreters
live isolated lives in the same process. As opposed to multiple processes,
they can communicate without involving any IPC. It would also be possible
to design proxy objects allowing one interpreter access to an object in
another. Immutable object such as strings would be particularly easy to

This very simple scheme should allow parallel processing with Python
similar to how it's done in Erlang, without the GIL getting in our way. At
least on Windows this can be done without touching the CPython source at
all. I am not sure about Linux though. I may be necessary to patch the
CPython source to make it work there.

Sturla Molden

More information about the Python-ideas mailing list