Re: [Numpy-discussion] Memory mapping and NPZ files

On Mi, 2015-12-09 at 15:51 +0100, Mathieu Dubois wrote:
Dear all,
If I am correct, using mmap_mode with Npz files has no effect i.e.: f = np.load("data.npz", mmap_mode="r") X = f['X'] will load all the data in memory.
My take on it is, that no, I do not want implicit extraction/copy of the file. However, npz files are not necessarily compressed, and I expect that in the non-compressed version, memory-mapping is possible on the uncompressed version. If that is possible, it would ideally work for uncompressed npz files and could raise an error which suggests to manually uncompress the file when mmap_mode is given. - Sebastian
Can somebody confirm that?
If I'm correct, the mmap_mode argument could be passed to the NpzFile class which could in turn perform the correct operation. One way to handle that would be to use the ZipFile.extract method to write the Npy file on disk and then load it with numpy.load with the mmap_mode argument. Note that the user will have to remove the file to reclaim disk space (I guess that's OK).
One problem that could arise is that the extracted Npy file can be large (it's the purpose of using memory mapping) and therefore it may be useful to offer some control on where this file is extracted (for instance /tmp can be too small to extract the file here). numpy.load could offer a new option for that (passed to ZipFile.extract).
Does it make sense?
Thanks in advance, Mathieu _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

On 10/12/2015 15:35, Sebastian Berg wrote:
On Mi, 2015-12-09 at 15:51 +0100, Mathieu Dubois wrote:
Dear all,
If I am correct, using mmap_mode with Npz files has no effect i.e.: f = np.load("data.npz", mmap_mode="r") X = f['X'] will load all the data in memory.
My take on it is, that no, I do not want implicit extraction/copy of the file. I agree it's controversial. However, npz files are not necessarily compressed, and I expect that in the non-compressed version, memory-mapping is possible on the uncompressed version. If that is possible, it would ideally work for uncompressed npz files and could raise an error which suggests to manually uncompress the file when mmap_mode is given. I got the same idea this afternoon. I will test that soon.
Thanks for your constructive answer! Mathieu
- Sebastian
Can somebody confirm that?
If I'm correct, the mmap_mode argument could be passed to the NpzFile class which could in turn perform the correct operation. One way to handle that would be to use the ZipFile.extract method to write the Npy file on disk and then load it with numpy.load with the mmap_mode argument. Note that the user will have to remove the file to reclaim disk space (I guess that's OK).
One problem that could arise is that the extracted Npy file can be large (it's the purpose of using memory mapping) and therefore it may be useful to offer some control on where this file is extracted (for instance /tmp can be too small to extract the file here). numpy.load could offer a new option for that (passed to ZipFile.extract).
Does it make sense?
Thanks in advance, Mathieu _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (2)
-
Mathieu Dubois
-
Sebastian Berg