[Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

Sun Jul 8 14:27:08 EDT 2018

In the past I have personally viewed Python as difficult to use for 
parallel applications, which need to do multiple things simultaneously 
for increased performance:

* The old Threads, Locks, & Shared State model is inefficient in Python 
due to the GIL, which limits CPU usage to only one thread at a time 
(ignoring certain functions implemented in C, such as I/O).

* The Actor model can be used with some effort via the “multiprocessing” 
module, but it doesn’t seem that streamlined and forces there to be a 
separate OS process per line of execution, which is relatively expensive.

I was thinking it would be nice if there was a better way to implement 
the Actor model, with multiple lines of execution in the same process, 
yet avoiding contention from the GIL. This implies a separate GIL for 
each line of execution (to eliminate contention) and a controlled way to 
exchange data between different lines of execution.

So I was thinking of proposing a design for implementing such a system. 
Or at least get interested parties thinking about such a system.

With some additional research I notice that [PEP 554] (“Multiple 
subinterpeters in the stdlib”) appears to be putting forward a design 
similar to the one I described. I notice however it mentions that 
subinterpreters currently share the GIL, which would seem to make them 
unusable for parallel scenarios due to GIL contention.

I'd like to solicit some feedback on what might be the most efficient 
way to make forward progress on efficient parallelization in Python 
inside the same OS process. The most promising areas appear to be:

1. Make the current subinterpreter implementation in Python have more 
complete isolation, sharing almost no state between subinterpreters. In 
particular not sharing the GIL. The "Interpreter Isolation" section of 
PEP 554 enumerates areas that are currently shared, some of which 
probably shouldn't be.

2. Give up on making things work inside the same OS process and rather 
focus on implementing better abstractions on top of the existing 
multiprocessing API so that the actor model is easier to program 
against. For example, providing some notion of Channels to communicate 
between lines of execution, a way to monitor the number of Messages 
waiting in each channel for throughput profiling and diagnostics, 
Supervision, etc. In particular I could do this by using an existing 
library like Pykka or Thespian and extending it where necessary.

Thoughts?

[PEP 554]: https://www.python.org/dev/peps/pep-0554/

-- 
David Foster | Seattle, WA, USA