numpy.memmap advice?

Thu Feb 19 12:51:57 EST 2009

On Feb 19, 9:34 am, Lionel <lionel.ke... at gmail.com> wrote:
> On Feb 18, 12:35 pm, Carl Banks <pavlovevide... at gmail.com> wrote:
>
>
>
>
>
> > On Feb 18, 10:48 am, Lionel <lionel.ke... at gmail.com> wrote:
>
> > > Thanks Carl, I like your solution. Am I correct in my understanding
> > > that memory is allocated at the slicing step in your example i.e. when
> > > "reshaped_data" is sliced using "interesting_data = reshaped_data[:,
> > > 50:100]"? In other words, given a huge (say 1Gb) file, a memmap object
> > > is constructed that memmaps the entire file. Some relatively small
> > > amount of memory is allocated for the memmap operation, but the bulk
> > > memory allocation occurs when I generate my final numpy sub-array by
> > > slicing, and this accounts for the memory efficiency of using memmap?
>
> > No, what accounts for the memory efficienty is there is no bulk
> > allocation at all.  The ndarray you have points to the memory that's
> > in the mmap.  There is no copying data or separate array allocation.
>
> Does this mean that everytime I iterate through an ndarray that is
> sourced from a memmap, the data is read from the disc? The sliced
> array is at no time wholly resident in memory? What are the
> performance implications of this?

Ok, sorry for the confusion.  What I should have said is that there is
no bulk allocation *by numpy* at all.  The call to mmap does allocate
a chunk of RAM to reflect file contents, but the numpy arrays don't
allocate any memory of their own: they use the same memory as was
allocated by the mmap call.

Carl Banks