Re: [Numpy-discussion] Memory mapping and NPZ files

On Dec 12, 2015 10:53 AM, "Mathieu Dubois" <mathieu.dubois@icm-institute.org> wrote:
Le 11/12/2015 11:22, Sturla Molden a écrit :
Mathieu Dubois <mathieu.dubois@icm-institute.org> wrote:
The point is precisely that, you can't do memory mapping with Npz files (while it works with Npy files).
The operating system can memory map any file. But as npz-files are compressed, you will need to uncompress the contents in your memory
mapping
to make sense of it.
We agree on that. The goal is to be able to create a np.memmap array from an Npz file.
I would suggest you use PyTables instead of npz-files. It allows on the fly compression and uncompression (via blosc) and will probably do what you want.
Yes I know I can use other solutions. The point is that np.load silently ignore the mmap option so I wanted to discuss ways to improve this.
I can see a good argument for transitioning to a rule where mmap=False doesn't mmap, mmap=True mmaps if the file is uncompressed and raises an error for compressed files, and mmap="if-possible" gives the current behavior. (It's even possible that the current code would already accept "if-possible" as a alias for True, which would make the transition easier.) Or maybe "never"/"always"/"if-possible" would be better for type consistency reasons, while deprecating the use of bools altogether. But this transition might be a bit more of a hassle, since these definitely won't work on older numpy's. Silently creating a massive temporary file doesn't seem like a great idea to me in any case. Creating a temporary file + mmaping it is essentially equivalent to just loading the data into swappable RAM, except that the swap case is guaranteed not to accidentally leave a massive temp file lying around afterwards. -n
participants (1)
-
Nathaniel Smith