On Dec 12, 2015 10:53 AM, "Mathieu Dubois" <mathieu.dubois@icm-institute.org> wrote:
>
> Le 11/12/2015 11:22, Sturla Molden a écrit :
>>
>> Mathieu Dubois <mathieu.dubois@icm-institute.org> wrote:
>>
>>> The point is precisely that, you can't do memory mapping with Npz files
>>> (while it works with Npy files).
>>
>> The operating system can memory map any file. But as npz-files are
>> compressed, you will need to uncompress the contents in your memory mapping
>> to make sense of it.
>
> We agree on that. The goal is to be able to create a np.memmap array from an Npz file.
>
>
>> I would suggest you use PyTables instead of npz-files.
>> It allows on the fly compression and uncompression (via blosc) and will
>> probably do what you want.
>
> Yes I know I can use other solutions. The point is that np.load silently ignore the mmap option so I wanted to discuss ways to improve this.
I can see a good argument for transitioning to a rule where mmap=False doesn't mmap, mmap=True mmaps if the file is uncompressed and raises an error for compressed files, and mmap="if-possible" gives the current behavior.
(It's even possible that the current code would already accept "if-possible" as a alias for True, which would make the transition easier.)
Or maybe "never"/"always"/"if-possible" would be better for type consistency reasons, while deprecating the use of bools altogether. But this transition might be a bit more of a hassle, since these definitely won't work on older numpy's.
Silently creating a massive temporary file doesn't seem like a great idea to me in any case. Creating a temporary file + mmaping it is essentially equivalent to just loading the data into swappable RAM, except that the swap case is guaranteed not to accidentally leave a massive temp file lying around afterwards.
-n