[Numpy-discussion] Adding `offset` argument to np.lib.format.open_memmap and np.load
Jon Olav Vik
jonovik at gmail.com
Thu Feb 24 10:49:06 EST 2011
https://github.com/jonovik/numpy/compare/master...offset_memmap
The `offset` argument to np.memmap enables memory-mapping a portion of a file
on disk to a memory-mapped Numpy array. Memory-mapping can also be done with
np.load, but it uses np.lib.format.open_memmap, which has no offset argument.
I have added an offset argument to np.lib.format.open_memmap and np.load as
detailed in the link above, and humbly submit the changes for review. This is
my first time using git, apologies for any mistakes.
Note that the offset is in terms of array elements, not bytes (which is what
np.memmap uses), because that was what I had use for. Also, I added a `shape`
to np.load to memory-map only a portion of a file.
My use case was to preallocate a big record array on disk, then start many
processes writing to their separate, memory-mapped segments of the file. The
end result was one big array on disk, with the correct shape and data type
information. Using a record array makes the data structure more self-
documenting. Using open_memmap with mode="w+" is the fastest way I've found to
preallocate an array on disk; it does not create the huge array in memory.
Letting multiple processes memory-map and read/write to non-overlapping
portions without interfering with each other allows for fast, simple parallel I/
O.
I've used this extensively on Numpy 1.4.0, but based my Git checkout on the
current Numpy trunk. There have been some rearrangements in np.load since then
(it used to be in np.lib.io and is now in np.lib.npyio), but as far as I can
see, my modifications carry over fine. I haven't had a chance to test with
Numpy trunk, though. (What is the best way to set up a test version without
affecting my working 1.4.0 setup?)
Hope this can be useful,
Jon Olav Vik
More information about the NumPy-Discussion
mailing list