70% [* SPAM *] multiprocessing.Queue blocks when sending large object

Lie Ryan lie.1296 at gmail.com
Mon Dec 5 07:27:59 EST 2011


On 11/30/2011 06:09 AM, DPalao wrote:
> Hello,
> I'm trying to use multiprocessing to parallelize a code. There is a number of
> tasks (usually 12) that can be run independently. Each task produces a numpy
> array, and at the end, those arrays must be combined.
> I implemented this using Queues (multiprocessing.Queue): one for input and
> another for output.
> But the code blocks. And it must be related to the size of the item I put on
> the Queue: if I put a small array, the code works well; if the array is
> realistically large (in my case if can vary from 160kB to 1MB), the code
> blocks apparently forever.
> I have tried this:
> http://www.bryceboe.com/2011/01/28/the-python-multiprocessing-queue-and-large-
> objects/
> but it didn't work (especifically I put a None sentinel at the end for each
> worker).
>
> Before I change the implementation,
> is there a way to bypass this problem with  multiprocessing.Queue?
> Should I post the code (or a sketchy version of it)?

Transferring data over multiprocessing.Queue involves copying the whole 
object across an inter-process pipe, so you need to have a reasonably 
large workload in the processes to justify the cost of the copying to 
benefit from running the workload in parallel.

You may try to avoid the cost of copying by using shared memory 
(http://docs.python.org/library/multiprocessing.html#sharing-state-between-processes); 
you can use Queue for communicating when a new data comes in or when a 
task is done, but put the large data in shared memory. Be careful not to 
access the data from multiple processes concurrently.

In any case, have you tried a multithreaded solution? numpy is a C 
extension, and I believe it releases the GIL when working, so it 
wouldn't be in your way to achieve parallelism.




More information about the Python-list mailing list