[Numpy-discussion] multiprocessing shared arrays and numpy
Nadav Horesh
nadavh at visionsense.com
Sat Mar 6 13:04:10 EST 2010
I did some optimization, and the results are very instructive, although not surprising:
javascript:SetCmd(cmdSend);
As I wrote before, I processed stereoscopic movie recordings, by making each a memory mapped file and processing it in several steps. By this way I produced extra GB of transient data. Running as one process took 45 seconds, and in dual parallel process ~40 seconds.
After rewriting the application to process the recording frame by frame. The code became shorter and the new scores are: One process --- 16 seconds, and dual process --- 9 seconds.
What I learned:
* Design for multi-procssing from the start, not as afterthought
* Shared memory works, but on the expense of code elegance (much like common blocks in fortran)
* Memory mapped files can be used much as shared memory. The strange thing is that I got an ignored AttributeError on every frame access to the memory mapped file from the child process.
Nadav
-----Original Message-----
From: numpy-discussion-bounces at scipy.org on behalf of Brian Granger
Sent: Fri 05-Mar-10 21:29
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] multiprocessing shared arrays and numpy
Francesc,
Yeah, 10% of improvement by using multi-cores is an expected figure for
> memory
> bound problems. This is something people must know: if their computations
> are
> memory bound (and this is much more common that one may initially think),
> then
> they should not expect significant speed-ups on their parallel codes.
>
>
+1
Thanks for emphasizing this. This is definitely a big issue with multicore.
Cheers,
Brian
> Thanks for sharing your experience anyway,
> Francesc
>
> A Thursday 04 March 2010 18:54:09 Nadav Horesh escrigué:
> > I can not give a reliable answer yet, since I have some more improvement
> to
> > make. The application is an analysis of a stereoscopic-movie raw-data
> > recording (both channels are recorded in the same file). I treat the
> data
> > as a huge memory mapped file. The idea was to process each channel (left
> > and right) on a different core. Right now the application is IO bounded
> > since I do classical numpy operation, so each channel (which is handled
> as
> > one array) is scanned several time. The improvement now over a single
> > process is 10%, but I hope to achieve 10% ore after trivial
> optimizations.
> >
> > I used this application as an excuse to dive into multi-processing. I
> hope
> > that the code I posted here would help someone.
> >
> > Nadav.
> >
> >
> > -----Original Message-----
> > From: numpy-discussion-bounces at scipy.org on behalf of Francesc Alted
> > Sent: Thu 04-Mar-10 15:12
> > To: Discussion of Numerical Python
> > Subject: Re: [Numpy-discussion] multiprocessing shared arrays and numpy
> >
> > What kind of calculations are you doing with this module? Can you please
> > send some examples and the speed-ups you are getting?
> >
> > Thanks,
> > Francesc
> >
> > A Thursday 04 March 2010 14:06:34 Nadav Horesh escrigué:
> > > Extended module that I used for some useful work.
> > > Comments:
> > > 1. Sturla's module is better designed, but did not work with very
> large
> > > (although sub GB) arrays 2. Tested on 64 bit linux (amd64) +
> > > python-2.6.4 + numpy-1.4.0
> > >
> > > Nadav.
> > >
> > >
> > > -----Original Message-----
> > > From: numpy-discussion-bounces at scipy.org on behalf of Nadav Horesh
> > > Sent: Thu 04-Mar-10 11:55
> > > To: Discussion of Numerical Python
> > > Subject: RE: [Numpy-discussion] multiprocessing shared arrays and numpy
> > >
> > > Maybe the attached file can help. Adpted and tested on amd64 linux
> > >
> > > Nadav
> > >
> > >
> > > -----Original Message-----
> > > From: numpy-discussion-bounces at scipy.org on behalf of Nadav Horesh
> > > Sent: Thu 04-Mar-10 10:54
> > > To: Discussion of Numerical Python
> > > Subject: Re: [Numpy-discussion] multiprocessing shared arrays and numpy
> > >
> > > There is a work by Sturla Molden: look for multiprocessing-tutorial.pdf
> > > and sharedmem-feb13-2009.zip. The tutorial includes what is dropped in
> > > the cookbook page. I am into the same issue and going to test it today.
> > >
> > > Nadav
> > >
> > > On Wed, 2010-03-03 at 15:31 +0100, Jesper Larsen wrote:
> > > > Hi people,
> > > >
> > > > I was wondering about the status of using the standard library
> > > > multiprocessing module with numpy. I found a cookbook example last
> > > > updated one year ago which states that:
> > > >
> > > > "This page was obsolete as multiprocessing's internals have changed.
> > > > More information will come shortly; a link to this page will then be
> > > > added back to the Cookbook."
> > > >
> > > > http://www.scipy.org/Cookbook/multiprocessing
> > > >
> > > > I also found the code that used to be on this page in the cookbook
> but
> > > > it does not work any more. So my question is:
> > > >
> > > > Is it possible to use numpy arrays as shared arrays in an application
> > > > using multiprocessing and how do you do it?
> > > >
> > > > Best regards,
> > > > Jesper
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion at scipy.org
> > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> > >
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at scipy.org
> > > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
>
> --
> Francesc Alted
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 5471 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20100306/99b60ad9/attachment.bin>
More information about the NumPy-Discussion
mailing list