[Numpy-discussion] Memory mapping and NPZ files

Erik Bray erik.m.bray+numpy at gmail.com
Fri Dec 11 16:35:28 EST 2015


On Wed, Dec 9, 2015 at 9:51 AM, Mathieu Dubois
<mathieu.dubois at icm-institute.org> wrote:
> Dear all,
>
> If I am correct, using mmap_mode with Npz files has no effect i.e.:
> f = np.load("data.npz", mmap_mode="r")
> X = f['X']
> will load all the data in memory.
>
> Can somebody confirm that?
>
> If I'm correct, the mmap_mode argument could be passed to the NpzFile class
> which could in turn perform the correct operation. One way to handle that
> would be to use the ZipFile.extract method to write the Npy file on disk and
> then load it with numpy.load with the mmap_mode argument. Note that the user
> will have to remove the file to reclaim disk space (I guess that's OK).
>
> One problem that could arise is that the extracted Npy file can be large
> (it's the purpose of using memory mapping) and therefore it may be useful to
> offer some control on where this file is extracted (for instance /tmp can be
> too small to extract the file here). numpy.load could offer a new option for
> that (passed to ZipFile.extract).

I have struggled for a long time with a similar (albeit more obscure
problem) with PyFITS / astropy.io.fits when it comes to supporting
memory-mapping of compressed FITS files.  For those unaware FITS is a
file format used primarily in Astronomy.

I have all kinds of wacky ideas for optimizing this, but at the moment
when you load data from a compressed FITS file with memory-mapping
enabled, obviously there's not much benefit because the contents of
the file are uncompressed in memory (there is a *little* benefit in
that the compressed data is mmap'd, but the compressed data is
typically much smaller than the uncompressed data).

Currently, in this case, I just issue a warning when the user
explicitly requests mmap=True, but won't get much benefit from it.
Maybe np.load could do the same, but I don't have a strong opinion
about it.  (I only added the warning in PyFITS because a user
requested it and was kind enough to provide a patch--seemed
reasonable).

Erik



More information about the NumPy-Discussion mailing list