numpy.memmap advice?

Thu Feb 19 13:36:49 EST 2009

On Feb 19, 9:51 am, Carl Banks <pavlovevide... at gmail.com> wrote:
> On Feb 19, 9:34 am, Lionel <lionel.ke... at gmail.com> wrote:
>
>
>
>
>
> > On Feb 18, 12:35 pm, Carl Banks <pavlovevide... at gmail.com> wrote:
>
> > > On Feb 18, 10:48 am, Lionel <lionel.ke... at gmail.com> wrote:
>
> > > > Thanks Carl, I like your solution. Am I correct in my understanding
> > > > that memory is allocated at the slicing step in your example i.e. when
> > > > "reshaped_data" is sliced using "interesting_data = reshaped_data[:,
> > > > 50:100]"? In other words, given a huge (say 1Gb) file, a memmap object
> > > > is constructed that memmaps the entire file. Some relatively small
> > > > amount of memory is allocated for the memmap operation, but the bulk
> > > > memory allocation occurs when I generate my final numpy sub-array by
> > > > slicing, and this accounts for the memory efficiency of using memmap?
>
> > > No, what accounts for the memory efficienty is there is no bulk
> > > allocation at all.  The ndarray you have points to the memory that's
> > > in the mmap.  There is no copying data or separate array allocation.
>
> > Does this mean that everytime I iterate through an ndarray that is
> > sourced from a memmap, the data is read from the disc? The sliced
> > array is at no time wholly resident in memory? What are the
> > performance implications of this?
>
> Ok, sorry for the confusion.  What I should have said is that there is
> no bulk allocation *by numpy* at all.  The call to mmap does allocate
> a chunk of RAM to reflect file contents, but the numpy arrays don't
> allocate any memory of their own: they use the same memory as was
> allocated by the mmap call.
>
> Carl Banks- Hide quoted text -
>
> - Show quoted text -

Thanks for the explanations Carl. I'm sorry, but it's me who's the
confused one here, not anyone else :-)

I hate to waste everyone's time again, but something is just not
"clicking" in that black-hole I call a brain. So..."numpy.memmap"
allocates a chunk off the heap to coincide with the file contents. If
I memmap the entire 1 Gb file, a corresponding amount (approx. 1 Gb)
is allocated? That seems to contradict what is stated in the numpy
documentation:

"class numpy.memmap
Create a memory-map to an array stored in a file on disk.

Memory-mapped files are used for accessing small segments of large
files on disk, without reading the entire file into memory."

In my previous example that we were working with (100x100 data file),
you used an offset to memmap the "lower-half" of the array. Does this
mean that in the process of memmapping that lower half, RAM was set
aside for 50x100 32-bit complex numbers? If so, and I decide to memmap
an entire file, there is no memory benefit in doing so.

At this point do you (or anyone else) recommend I just write a little
function for my class that takes the coords I intend to load and "roll
my own" function? Seems like the best way to keep memory to a minimum,
I'm just worried about performance. On the other hand, the most I'd be
loading would be around 1k x 1k worth of data.