[Python-ideas] solving multi-core Python

Thu Jun 25 02:09:55 CEST 2015

On Wed, Jun 24, 2015 at 4:30 PM, Sturla Molden <sturla.molden at gmail.com> wrote:
> On 25/06/15 00:10, Devin Jeanpierre wrote:
>
>> So there's two reasons I can think of to use threads for CPU parallelism:
>>
>> - My thing does a lot of parallel work, and so I want to save on
>> memory by sharing an address space
>>
>> This only becomes an especially pressing concern if you start running
>> tens of thousands or more of workers. Fork also allows this.
>
>
> This might not be a valid concern. Sharing address space means sharing
> *virtual memory*. Presumably what they really want is to save *physical
> memory*. Two processes can map the same physical memory into virtual memory.

Yeah, physical memory. I agree, processes with shared memory can be
made to work in practice. Although, threads are better for memory
usage, by defaulting to sharing even on write. (Good for memory, maybe
not so good for bug-freedom...)

So from my perspective, this is the hard problem in multicore python.
My views may be skewed by the peculiarities of the one major app I've
worked on.

>> Same applies to strings and other non-compound datatypes. Compound
>> datatypes are hard even for the subinterpreter case, just because the
>> objects you're referring to are not likely to exist on the other end,
>> so you need a real copy.
>
>
> Yes.
>
> With a "share nothing" message-passing approach, one will have to make deep
> copies of any mutable object. And even though a tuple can be immutable, it
> could still contain mutable objects. It is really hard to get around the
> pickle overhead with subinterpreters. Since the pickle overhead is huge
> compared to the low-level IPC, there is very little to save in this manner.

I think this is giving up too easily. Here's a stupid idea for
sharable interpreter-specific objects:

You keep a special heap for immutable object refcounts, where each
thread/process has its own region in the heap. Refcount locations are
stored as offsets into the thread local heap, and incref does
++*(threadlocal_refcounts + refcount_offset);

Then for the rest of a pyobject's memory, we share by default and
introduce a marker for which thread originated it. Any non-threadsafe
operations can check if the originating thread id is the same as the
current thread id, and raise an exception if not, before even reading
the memory at all. So it introduces an overhead to accessing mutable
objects. Also, this won't work with extension objects that don't
check, those just get shared and unsafely mutate and crash.

This also introduces the possibility of sharing mutable objects
between interpreters, if the objects themselves choose to implement
fine-grained locking. And it should work fine with fork if we change
how the refcount heap is allocated, to use mmap or whatever.

This is probably not acceptable for real, but I just mean to show with
a straw man that the problem can be attacked.

-- Devin