[Numpy-discussion] multiprocessing shared arrays and numpy

Gael Varoquaux gael.varoquaux at normalesup.org
Thu Mar 11 08:35:49 EST 2010

On Thu, Mar 11, 2010 at 02:26:49PM +0100, Francesc Alted wrote:
> > I believe that your above assertion is 'half' right. First I think that
> > it is not SWAP that the memapped file uses, but the original disk space,
> > thus you avoid running out of SWAP. Second, if you open several times the
> > same data without memmapping, I believe that it will be duplicated in
> > memory. On the other hand, when you memapping, it is not duplicated, thus
> > if you are running several processing jobs on the same data, you save
> > memory. I am very much in this case.

> Mmh, this is not my experience.  During the past month, I was proposing in a 
> course the students to compare the memory consumption of numpy.memmap and 
> tables.Expr (a module for performing out-of-memory computations in PyTables). 

> [snip]

> So, in my experience, numpy.memmap is really using that large chunk of memory 
> (unless my testbed is badly programmed, in which case I'd be grateful if you 
> can point out what's wrong).

OK, so what you are saying is that my assertion #1 was wrong. Fair
enough, as I was writing it I was thinking that I had no hard fact to
back it. How about assertion #2? I can think only of this 'story' to
explain why I can run parallel computation when I use memmap that blow up
if I don't use memmap.

Also, could it be that the memmap mode changes things? I use only the 'r'
mode, which is read-only.

This is all very interesting, and you have much more insights on these
problems than me. Would you be interested in coming to Euroscipy in Paris
to give a 1 or 2 hours long tutorial on memory and IO problems and how
you address them with Pytables? It would be absolutely thrilling. I must
warn that I am afraid that we won't be able to pay for your trip, though,
as I want to keep the price of the conference low.



More information about the NumPy-Discussion mailing list