[Python-ideas] Exposing CPython's subinterpreter C-API in the stdlib.

Sat May 27 04:32:00 EDT 2017

Hi Nick,

> I guess I'll have to scale back my hopes on that front to be closer to
> what Stephan described - even a deep copy equivalent is often going to
> be cheaper than a full serialise/transmit/deserialise cycle or some
> other form of inter-process communication.

I would like to add that in many cases the underlying C objects *could* be
shared.
I identified some possible use cases of this.

1. numpy/scipy: share underlying memory of ndarray
  Effectively threads can then operate on the same array without GIL
interference.

2. Sqlite in-memory database
  Multiple threads can operate on it in parallel.
  If you have an ORM it might feel very similar to just sharing Python
objects across threads.

3. Tree of XML elements (like xml.etree)
   Assuming the tree data structure itself is in C, the tree could be
shared across
   interpreters. This would be an example of a "deep" datastructure which can
   still be efficiently shared.

So I feel this could still be very useful even if pure-Python objects
need to be copied.

Thanks,

Stephan

2017-05-27 9:32 GMT+02:00 Nick Coghlan <ncoghlan at gmail.com>:
> On 27 May 2017 at 03:30, Guido van Rossum <guido at python.org> wrote:
>> On Fri, May 26, 2017 at 8:28 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>>
>>> [...] assuming the rest of idea works out
>>> well, we'd eventually like to move to a tiered model where the GIL
>>> becomes a read/write lock. Most code execution in subinterpreters
>>> would then only need a read lock on the GIL, and hence could happily
>>> execute code in parallel with other subinterpreters running on other
>>> cores.
>>
>>
>> Since the GIL protects refcounts and refcounts are probably the most
>> frequently written item, I'm skeptical of this.
>
> Likewise - hence my somewhat garbled attempt to explain that actually
> doing that would be contingent on the GILectomy folks figuring out
> some clever way to cope with the refcounts :)
>
>>> By contrast, being able to reliably model Communicating Sequential
>>> Processes in Python without incurring any communications overhead
>>> though (ala goroutines)? Or doing the same with the Actor model (ala
>>> Erlang/BEAM processes)?
>>>
>>> Those are *very* interesting language design concepts, and something
>>> where offering a compelling alternative to the current practices of
>>> emulating them with threads or coroutines pretty much requires the
>>> property of zero-copy ownership transfer.
>>
>> But subinterpreters (which have independent sys.modules dicts) seem a poor
>> match for that. It feels as if you're speculating about an entirely
>> different language here, not named Python.
>
> Ah, you're right - the types are all going to be separate as well,
> which means "cost of a deep copy" is the cheapest we're going to be
> able to get with this model. Anything better than that would require a
> more esoteric memory management architecture like the one in
> PyParallel.
>
> I guess I'll have to scale back my hopes on that front to be closer to
> what Stephan described - even a deep copy equivalent is often going to
> be cheaper than a full serialise/transmit/deserialise cycle or some
> other form of inter-process communication.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia