70% [* SPAM *] Re: multiprocessing.Queue blocks when sending large object
DPalao
dpalao.python at gmail.com
Mon Dec 5 13:28:22 EST 2011
Hi Lie,
Thank you for the reply.
El Lunes Diciembre 5 2011, Lie Ryan escribió:
> On 11/30/2011 06:09 AM, DPalao wrote:
> > Hello,
> > I'm trying to use multiprocessing to parallelize a code. There is a
> > number of tasks (usually 12) that can be run independently. Each task
> > produces a numpy array, and at the end, those arrays must be combined.
> > I implemented this using Queues (multiprocessing.Queue): one for input
> > and another for output.
> > But the code blocks. And it must be related to the size of the item I put
> > on the Queue: if I put a small array, the code works well; if the array
> > is realistically large (in my case if can vary from 160kB to 1MB), the
> > code blocks apparently forever.
> > I have tried this:
> > http://www.bryceboe.com/2011/01/28/the-python-multiprocessing-queue-and-l
> > arge- objects/
> > but it didn't work (especifically I put a None sentinel at the end for
> > each worker).
> >
> > Before I change the implementation,
> > is there a way to bypass this problem with multiprocessing.Queue?
> > Should I post the code (or a sketchy version of it)?
>
> Transferring data over multiprocessing.Queue involves copying the whole
> object across an inter-process pipe, so you need to have a reasonably
> large workload in the processes to justify the cost of the copying to
> benefit from running the workload in parallel.
>
> You may try to avoid the cost of copying by using shared memory
> (http://docs.python.org/library/multiprocessing.html#sharing-state-between-
> processes); you can use Queue for communicating when a new data comes in or
> when a task is done, but put the large data in shared memory. Be careful
> not to access the data from multiple processes concurrently.
>
Yep, that was my first thought, but the arrays's elements are complex64 (or
complex in general), and I don't know how to easily convert from
multiprocessing.Array to/from numpy.array when the type is complex. Doing that
would require some extra conversions forth and back which make the solution
not very attractive to me.
I tried with a Manager too, but the array cannot be modified from within the
worker processes.
In principle, the array I need to share is expected to be, at most, ~2MB in
size, and typically should be only <200kB. So, in principle, there is no huge
extra workload. But that could change, and I'd like to be prepared for it, so
any idea about using an Array or a Manager or another shared memory thing
would be great.
> In any case, have you tried a multithreaded solution? numpy is a C
> extension, and I believe it releases the GIL when working, so it
> wouldn't be in your way to achieve parallelism.
That possibility I didn't know. What does exactly break the GIL? The sharing
of a numpy array? What if I need to also share some other "standard" python
data (eg, a dictionary)?
Best regards,
David
More information about the Python-list
mailing list