Re: [Numpy-discussion] multiprocessing shared arrays and numpy

March 11, 2010


      A Thursday 11 March 2010 14:35:49 Gael Varoquaux escrigué:
...
...
So, in my experience, numpy.memmap is really using that large chunk of
memory (unless my testbed is badly programmed, in which case I'd be
grateful if you can point out what's wrong).
OK, so what you are saying is that my assertion #1 was wrong. Fair
enough, as I was writing it I was thinking that I had no hard fact to
back it. How about assertion #2? I can think only of this 'story' to
explain why I can run parallel computation when I use memmap that blow up
if I don't use memmap.
Well, I must tell that I've not experience about running memmapped arrays in 
parallel computations, but it sounds like they can actually behave as shared-
memory arrays, so yes, you may definitely be right for #2, i.e. memmapped data 
is not duplicated when accessed in parallel by different processes (in read-
only mode, of course), which is certainly a very interesting technique to 
share data in parallel processes.  Thanks for pointing out this!
...
Also, could it be that the memmap mode changes things? I use only the 'r'
mode, which is read-only.
I don't think so.  When doing the computation, I open the x values in read-
only mode, and memory consumption is still there.
...
This is all very interesting, and you have much more insights on these
problems than me. Would you be interested in coming to Euroscipy in Paris
to give a 1 or 2 hours long tutorial on memory and IO problems and how
you address them with Pytables? It would be absolutely thrilling. I must
warn that I am afraid that we won't be able to pay for your trip, though,
as I want to keep the price of the conference low.
Yes, no problem.  I was already thinking about presenting something at 
EuroSciPy.  A tutorial about PyTables/memory IO would be really great for me.  
We can nail the details off-list.

-- 
Francesc Alted