70% [* SPAM *] Re: multiprocessing.Queue blocks when sending large object

DPalao dpalao.python at gmail.com
Mon Dec 5 13:28:22 EST 2011


Hi Lie,
Thank you for the reply.

El Lunes Diciembre 5 2011, Lie Ryan escribió:
> On 11/30/2011 06:09 AM, DPalao wrote:
> > Hello,
> > I'm trying to use multiprocessing to parallelize a code. There is a
> > number of tasks (usually 12) that can be run independently. Each task
> > produces a numpy array, and at the end, those arrays must be combined.
> > I implemented this using Queues (multiprocessing.Queue): one for input
> > and another for output.
> > But the code blocks. And it must be related to the size of the item I put
> > on the Queue: if I put a small array, the code works well; if the array
> > is realistically large (in my case if can vary from 160kB to 1MB), the
> > code blocks apparently forever.
> > I have tried this:
> > http://www.bryceboe.com/2011/01/28/the-python-multiprocessing-queue-and-l
> > arge- objects/
> > but it didn't work (especifically I put a None sentinel at the end for
> > each worker).
> > 
> > Before I change the implementation,
> > is there a way to bypass this problem with  multiprocessing.Queue?
> > Should I post the code (or a sketchy version of it)?
> 
> Transferring data over multiprocessing.Queue involves copying the whole
> object across an inter-process pipe, so you need to have a reasonably
> large workload in the processes to justify the cost of the copying to
> benefit from running the workload in parallel.
> 
> You may try to avoid the cost of copying by using shared memory
> (http://docs.python.org/library/multiprocessing.html#sharing-state-between-
> processes); you can use Queue for communicating when a new data comes in or
> when a task is done, but put the large data in shared memory. Be careful
> not to access the data from multiple processes concurrently.
> 

Yep, that was my first thought, but the arrays's elements are complex64 (or 
complex in general), and I don't know how to easily convert from 
multiprocessing.Array to/from numpy.array when the type is complex. Doing that 
would require some extra conversions forth and back which make the solution 
not very attractive to me.
I tried with a Manager too, but the array cannot be modified from within the 
worker processes.
 
In principle, the array I need to share is expected to be, at most, ~2MB in 
size, and typically should be only <200kB. So, in principle, there is no huge 
extra workload. But that could change, and I'd like to be prepared for it, so 
any idea about using an Array or a Manager or another shared memory thing 
would be great.

> In any case, have you tried a multithreaded solution? numpy is a C
> extension, and I believe it releases the GIL when working, so it
> wouldn't be in your way to achieve parallelism.

That possibility I didn't know. What does exactly break the GIL? The sharing 
of a numpy array? What if I need to also share some other "standard" python 
data (eg, a dictionary)?

Best regards,

David



More information about the Python-list mailing list