
On Wed, Jun 24, 2015 at 4:30 PM, Sturla Molden <sturla.molden@gmail.com> wrote:
On 25/06/15 00:10, Devin Jeanpierre wrote:
So there's two reasons I can think of to use threads for CPU parallelism:
- My thing does a lot of parallel work, and so I want to save on memory by sharing an address space
This only becomes an especially pressing concern if you start running tens of thousands or more of workers. Fork also allows this.
This might not be a valid concern. Sharing address space means sharing *virtual memory*. Presumably what they really want is to save *physical memory*. Two processes can map the same physical memory into virtual memory.
Yeah, physical memory. I agree, processes with shared memory can be made to work in practice. Although, threads are better for memory usage, by defaulting to sharing even on write. (Good for memory, maybe not so good for bug-freedom...) So from my perspective, this is the hard problem in multicore python. My views may be skewed by the peculiarities of the one major app I've worked on.
Same applies to strings and other non-compound datatypes. Compound datatypes are hard even for the subinterpreter case, just because the objects you're referring to are not likely to exist on the other end, so you need a real copy.
Yes.
With a "share nothing" message-passing approach, one will have to make deep copies of any mutable object. And even though a tuple can be immutable, it could still contain mutable objects. It is really hard to get around the pickle overhead with subinterpreters. Since the pickle overhead is huge compared to the low-level IPC, there is very little to save in this manner.
I think this is giving up too easily. Here's a stupid idea for sharable interpreter-specific objects: You keep a special heap for immutable object refcounts, where each thread/process has its own region in the heap. Refcount locations are stored as offsets into the thread local heap, and incref does ++*(threadlocal_refcounts + refcount_offset); Then for the rest of a pyobject's memory, we share by default and introduce a marker for which thread originated it. Any non-threadsafe operations can check if the originating thread id is the same as the current thread id, and raise an exception if not, before even reading the memory at all. So it introduces an overhead to accessing mutable objects. Also, this won't work with extension objects that don't check, those just get shared and unsafely mutate and crash. This also introduces the possibility of sharing mutable objects between interpreters, if the objects themselves choose to implement fine-grained locking. And it should work fine with fork if we change how the refcount heap is allocated, to use mmap or whatever. This is probably not acceptable for real, but I just mean to show with a straw man that the problem can be attacked. -- Devin