Re: [Numpy-discussion] Memory mapping and NPZ files
On Wed, Dec 9, 2015 at 9:51 AM, Mathieu Dubois
Dear all,
If I am correct, using mmap_mode with Npz files has no effect i.e.: f = np.load("data.npz", mmap_mode="r") X = f['X'] will load all the data in memory.
Can somebody confirm that?
If I'm correct, the mmap_mode argument could be passed to the NpzFile class which could in turn perform the correct operation. One way to handle that would be to use the ZipFile.extract method to write the Npy file on disk and then load it with numpy.load with the mmap_mode argument. Note that the user will have to remove the file to reclaim disk space (I guess that's OK).
One problem that could arise is that the extracted Npy file can be large (it's the purpose of using memory mapping) and therefore it may be useful to offer some control on where this file is extracted (for instance /tmp can be too small to extract the file here). numpy.load could offer a new option for that (passed to ZipFile.extract).
I have struggled for a long time with a similar (albeit more obscure problem) with PyFITS / astropy.io.fits when it comes to supporting memory-mapping of compressed FITS files. For those unaware FITS is a file format used primarily in Astronomy. I have all kinds of wacky ideas for optimizing this, but at the moment when you load data from a compressed FITS file with memory-mapping enabled, obviously there's not much benefit because the contents of the file are uncompressed in memory (there is a *little* benefit in that the compressed data is mmap'd, but the compressed data is typically much smaller than the uncompressed data). Currently, in this case, I just issue a warning when the user explicitly requests mmap=True, but won't get much benefit from it. Maybe np.load could do the same, but I don't have a strong opinion about it. (I only added the warning in PyFITS because a user requested it and was kind enough to provide a patch--seemed reasonable). Erik
participants (1)
-
Erik Bray