[Numpy-discussion] Adding `offset` argument to np.lib.format.open_memmap and np.load

Jon Olav Vik jonovik at gmail.com
Tue Mar 1 08:20:34 EST 2011


Robert Kern <robert.kern <at> gmail.com> writes:

> On Mon, Feb 28, 2011 at 18:50, Sturla Molden <sturla <at> molden.no> wrote:
> > Den 01.03.2011 01:15, skrev Robert Kern:
> >> You can have each of those processes memory-map the whole file and
> >> just operate on their own slices. Your operating system's virtual
> >> memory manager should handle all of the details for you.

Wow, I didn't know that. So as long as the ranges touched by each process do 
not overlap, I'll be safe? If I modify only a few discontiguous chunks in a 
range, will the virtual memory manager decide whether it is most efficient to 
write just the chunks or the entire range back to disk?

> > Mapping large files from the start will not always work on 32-bit
> > systems. That is why mmap.mmap take an offset argument now (Python 2.7
> > and 3.1.)
> >
> > Making a view np.memmap with slices is useful on 64-bit but not 32-bit
> > systems.
> 
> I'm talking about the OP's stated use case where he generates the file
> via memory-mapping the whole thing on the same machine. The whole file
> does fit into the address space in his use case.
> 
> I'd like to see a real use case where this does not hold. I suspect
> that this is not the API we would want for such use cases.

Use case: Generate "large" output for "many" parameter scenarios.
1. Preallocate "enormous" output file on disk.
2. Each process fills in part of the output.
3. Analyze, aggregate results, perhaps save to HDF or database, in a sliding-
window fashion using a memory-mapped array. The aggregated results fit in 
memory, even though the raw output doesn't.

My real work has been done on a 64-bit cluster running 64-bit Python, but I'd 
like to have the option of post-processing on my laptop's 32-bit Python (either 
spending a few hours copying the file to my laptop first, or mounting the 
remote disk using e.g. ExpanDrive).

Maybe that is impossible with 32-bit Python: at least I cannot allocate that 
big a file on my laptop.

>>> m = np.lib.format.open_memmap("c:/temp/temp.npy", "w+", dtype=np.int8, 
shape=2**33)
------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython console>", line 1, in <module>
  File "C:\Python26\lib\site-packages\numpy\lib\format.py", line 563, in 
open_memmap
    mode=mode, offset=offset)
  File "C:\Python26\lib\site-packages\numpy\core\memmap.py", line 221, in 
__new__
    mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
OverflowError: cannot fit 'long' into an index-sized integer





More information about the NumPy-Discussion mailing list