[Numpy-discussion] np.array, copy=False and memmap

Allan Haldane allanhaldane at gmail.com
Thu Aug 10 16:06:38 EDT 2017


On 08/07/2017 05:01 PM, Nisoli Isaia wrote:
> Dear all,
> I have a question about the behaviour of
> 
> y = np.array(x, copy=False, dtype='float32')
> 
> when x is a memmap. If we check the memmap attribute of mmap
> 
> print "mmap attribute", y._mmap
> 
> numpy tells us that y is not a memmap.
> But the following code snippet crashes the python interpreter
> 
> # opens the memmap
> with open(filename,'r+b') as f:
>       mm = mmap.mmap(f.fileno(),0)
>       x = np.frombuffer(mm, dtype='float32')
> 
> # builds an array from the memmap, with the option copy=False
> y = np.array(x, copy=False, dtype='float32')
> print "before", y
> 
> # closes the file
> mm.close()
> print "after", y
> 
> In my code I use memmaps to share read-only objects when doing parallel
> processing
> and the behaviour of np.array, even if not consistent, it's desirable.
> I share scipy sparse matrices over many processes and if np.array would
> make a copy
> when dealing with memmaps this would force me to rewrite part of the sparse
> matrices
> code.
> Would it be possible in the future releases of numpy to have np.array
> check,
> if copy is false, if y is a memmap and in that case return a full memmap
> object
> instead of slicing it?
> 
> Best wishes
> Isaia
> 
> P.S. A longer account of the issue may be found on my university blog
> http://www.im.ufrj.br/nisoli/blog/?p=131

I just read your blog post, as well.

To confirm your question there: yes, if you slice or "view" a numpy
array which points to memmapped data, then the slice or view will also
point to memmapped data and will not make a copy. This way you avoid
using up a lot of memory.

It is also important to realize that `np.memmap` is merely a subclass of
`np.ndarray` which just provides a few extra helper methods which
ndarrays don't have, but is otherwise identical. The most important
difference is that `np.memmap` has a `flush` method. (It also has a
_mmap private attribute). But otherwise, both ndarrays and memmaps have
an internal data pointer pointing to the underlying data, and slices or
views of ndarrays (or memmaps) will point to the same memory (no
copies). In your code when you do

y = np.array(x, copy=False)

where x is a np.memmap object, y will point to the same memory locations
as x. However, y will not be a memmap object, because of how you
constructed it, so will not have the `flush` method which can be
important if you are writing to y and expect it to be written to disk.
If you are only reading from  y, though, this shouldn't matter.

Also, note that an np.memmap object is different from an mmap.mmap
object: The former uses the latter internally.

Allan




More information about the NumPy-Discussion mailing list