[Numpy-discussion] Memory mapping and NPZ files

Nathaniel Smith njs at pobox.com
Sat Dec 12 17:22:46 EST 2015

On Dec 12, 2015 10:53 AM, "Mathieu Dubois" <mathieu.dubois at icm-institute.org>
> Le 11/12/2015 11:22, Sturla Molden a écrit :
>> Mathieu Dubois <mathieu.dubois at icm-institute.org> wrote:
>>> The point is precisely that, you can't do memory mapping with Npz files
>>> (while it works with Npy files).
>> The operating system can memory map any file. But as npz-files are
>> compressed, you will need to uncompress the contents in your memory
>> to make sense of it.
> We agree on that. The goal is to be able to create a np.memmap array from
an Npz file.
>> I would suggest you use PyTables instead of npz-files.
>> It allows on the fly compression and uncompression (via blosc) and will
>> probably do what you want.
> Yes I know I can use other solutions. The point is that np.load silently
ignore the mmap option so I wanted to discuss ways to improve this.

I can see a good argument for transitioning to a rule where mmap=False
doesn't mmap, mmap=True mmaps if the file is uncompressed and raises an
error for compressed files, and mmap="if-possible" gives the current

(It's even possible that the current code would already accept
"if-possible" as a alias for True, which would make the transition easier.)

Or maybe "never"/"always"/"if-possible" would be better for type
consistency reasons, while deprecating the use of bools altogether. But
this transition might be a bit more of a hassle, since these definitely
won't work on older numpy's.

Silently creating a massive temporary file doesn't seem like a great idea
to me in any case. Creating a temporary file + mmaping it is essentially
equivalent to just loading the data into swappable RAM, except that the
swap case is guaranteed not to accidentally leave a massive temp file lying
around afterwards.

