[Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

MRAB python at mrabarnett.plus.com
Mon Jul 16 13:00:37 EDT 2018


On 2018-07-16 05:24, Chris Angelico wrote:
> On Mon, Jul 16, 2018 at 1:21 PM, Nathaniel Smith <njs at pobox.com> wrote:
>> On Sun, Jul 15, 2018 at 6:00 PM, Chris Angelico <rosuav at gmail.com> wrote:
>>> On Mon, Jul 16, 2018 at 10:31 AM, Nathaniel Smith <njs at pobox.com> wrote:
>>>> On Sun, Jul 8, 2018 at 11:27 AM, David Foster <davidfstr at gmail.com> wrote:
>>>>> * The Actor model can be used with some effort via the “multiprocessing”
>>>>> module, but it doesn’t seem that streamlined and forces there to be a
>>>>> separate OS process per line of execution, which is relatively expensive.
>>>>
>>>> What do you mean by "the Actor model"? Just shared-nothing
>>>> concurrency? (My understanding is that in academia it means
>>>> shared-nothing + every thread/process/whatever gets an associated
>>>> queue + queues are globally addressable + queues have unbounded
>>>> buffering + every thread/process/whatever is implemented as a loop
>>>> that reads messages from its queue and responds to them, with no
>>>> internal concurrency. I don't know why this particular bundle of
>>>> features is considered special. Lots of people seem to use it in
>>>> looser sense though.)
>>>
>>> Shared-nothing concurrency is, of course, the very easiest way to
>>> parallelize. But let's suppose you're trying to create an online
>>> multiplayer game. Since it's a popular genre at the moment, I'll go
>>> for a battle royale game (think PUBG, H1Z1, Fortnite, etc). A hundred
>>> people enter; one leaves. The game has to let those hundred people
>>> interact, which means that all hundred people have to be connected to
>>> the same server. And you have to process everyone's movements,
>>> gunshots, projectiles, etc, etc, etc, fast enough to be able to run a
>>> server "tick" enough times per second - I would say 32 ticks per
>>> second is an absolute minimum, 64 is definitely better. So what
>>> happens when the processing required takes more than one CPU core for
>>> 1/32 seconds? A shared-nothing model is either fundamentally
>>> impossible, or a meaningless abstraction (if you interpret it to mean
>>> "explicit queues/pipes for everything"). What would the "Actor" model
>>> do here?
>>
>> "Shared-nothing" is a bit of jargon that means there's no *implicit*
>> sharing; your threads can still communicate, the communication just
>> has to be explicit. I don't know exactly what algorithms your
>> hypothetical game needs, but they might be totally fine in a
>> shared-nothing approach. It's not just for embarrassingly parallel
>> problems.
> 
> Right, so basically it's the exact model that Python *already* has for
> multiprocessing - once you go to separate processes, nothing is
> implicitly shared, and everything has to be done with queues.
> 
>>> Ideally, I would like to be able to write my code as a set of
>>> functions, then easily spin them off as separate threads, and have
>>> them able to magically run across separate CPUs. Unicorns not being a
>>> thing, I'm okay with warping my code a bit around the need for
>>> parallelism, but I'm not sure how best to do that. Assume here that we
>>> can't cheat by getting most of the processing work done with the GIL
>>> released (eg in Numpy), and it actually does require Python-level
>>> parallelism of CPU-heavy work.
>>
>> If you need shared-memory threads, on multiple cores, for CPU-bound
>> logic, where the logic is implemented in Python, then yeah, you
>> basically need a free-threaded implementation of Python. Jython is
>> such an implementation. PyPy could be if anyone were interested in
>> funding it [1], but apparently no-one is. Probably removing the GIL
>> from CPython is impossible. (I'd be happy to be proven wrong.) Sorry I
>> don't have anything better to report.
> 
> (This was a purely hypothetical example.)
> 
> There could be some interesting results from using the GIL only for
> truly global objects, and then having other objects guarded by arena
> locks. The trouble is that, in CPython, as soon as you reference any
> read-only object from the globals, you need to raise its refcount.
> ISTR someone mentioned something along the lines of
> sys.eternalize(obj) to flag something as "never GC this thing, it no
> longer has a refcount", which would then allow global objects to be
> referenced in a truly read-only way (eg to call a function). Sadly,
> I'm not expert enough to actually look into implementing it, but it
> does seem like a very cool concept. It also fits into the "warping my
> code a bit" category (eg eternalizing a small handful of key objects,
> and paying the price of "well, now they can never be garbage
> collected"), with the potential to then parallelize more easily.
> 
Could you explicitly share an object in a similar way to how you 
explicitly open a file?

The shared object's refcount would be incremented and the sharing 
function would return a proxy to the shared object.

Refcounting in the thread/process would be done on the proxy.

When the proxy is closed or garbage-collected, the shared object's 
refcount would be decremented.

The shared object could be garbage-collected when its refcount drops to 
zero.

>> The good news is that there are many, many situations where you don't
>> actually need "shared-memory threads, on multiple cores, for CPU-bound
>> logic, where the logic is implemented in Python".
> 
> Oh absolutely. MOST of my parallelism requirements involve regular
> Python threads, because they spend most of their time blocked on
> something. That one is easy. The hassle comes when something MIGHT
> need parallelism and might not, based on (say) how much data it has to
> work with; for those kinds of programs, I would like to be able to
> code it the simple way with minimal code overhead, but still able to
> split over cores. And yes, I'm aware that it's never going to be
> perfect, but the closer the better.
> 


More information about the Python-ideas mailing list