
A Thursday 11 March 2010 14:35:49 Gael Varoquaux escrigué:
So, in my experience, numpy.memmap is really using that large chunk of memory (unless my testbed is badly programmed, in which case I'd be grateful if you can point out what's wrong).
OK, so what you are saying is that my assertion #1 was wrong. Fair enough, as I was writing it I was thinking that I had no hard fact to back it. How about assertion #2? I can think only of this 'story' to explain why I can run parallel computation when I use memmap that blow up if I don't use memmap.
Well, I must tell that I've not experience about running memmapped arrays in parallel computations, but it sounds like they can actually behave as shared- memory arrays, so yes, you may definitely be right for #2, i.e. memmapped data is not duplicated when accessed in parallel by different processes (in read- only mode, of course), which is certainly a very interesting technique to share data in parallel processes. Thanks for pointing out this!
Also, could it be that the memmap mode changes things? I use only the 'r' mode, which is read-only.
I don't think so. When doing the computation, I open the x values in read- only mode, and memory consumption is still there.
This is all very interesting, and you have much more insights on these problems than me. Would you be interested in coming to Euroscipy in Paris to give a 1 or 2 hours long tutorial on memory and IO problems and how you address them with Pytables? It would be absolutely thrilling. I must warn that I am afraid that we won't be able to pay for your trip, though, as I want to keep the price of the conference low.
Yes, no problem. I was already thinking about presenting something at EuroSciPy. A tutorial about PyTables/memory IO would be really great for me. We can nail the details off-list. -- Francesc Alted