[Numpy-discussion] np.array, copy=False and memmap

Sebastian Berg sebastian at sipsolutions.net
Thu Aug 10 14:24:05 EDT 2017


On Thu, 2017-08-10 at 12:27 -0400, Allan Haldane wrote:
> On 08/07/2017 05:01 PM, Nisoli Isaia wrote:
> > Dear all,
> > I have a question about the behaviour of
> > 
> > y = np.array(x, copy=False, dtype='float32')
> > 
> > when x is a memmap. If we check the memmap attribute of mmap
> > 
> > print "mmap attribute", y._mmap
> > 
> > numpy tells us that y is not a memmap.
> > But the following code snippet crashes the python interpreter
> > 
> > # opens the memmap
> > with open(filename,'r+b') as f:
> >       mm = mmap.mmap(f.fileno(),0)
> >       x = np.frombuffer(mm, dtype='float32')
> > 
> > # builds an array from the memmap, with the option copy=False
> > y = np.array(x, copy=False, dtype='float32')
> > print "before", y
> > 
> > # closes the file
> > mm.close()
> > print "after", y
> > 
> > In my code I use memmaps to share read-only objects when doing
> > parallel
> > processing
> > and the behaviour of np.array, even if not consistent, it's
> > desirable.
> > I share scipy sparse matrices over many processes and if np.array
> > would
> > make a copy
> > when dealing with memmaps this would force me to rewrite part of
> > the sparse
> > matrices
> > code.
> > Would it be possible in the future releases of numpy to have
> > np.array
> > check,
> > if copy is false, if y is a memmap and in that case return a full
> > memmap
> > object
> > instead of slicing it?
> 
> This does appear to be a bug in numpy or mmap.
> 

Frankly on first sight, I do not think it is a bug in either of them.
Numpy uses view (memmap really is just a name for a memory map backed
numpy array). The numpy array will hold a reference to the memory map
object in its `.base` attribute (or the base of the base, etc.).

If you close a mmap object, and then keep using it, you can get
segfaults of course, I am not sure what you can do about it. Maybe
python can try to warn you when you exit the context/close a file
pointer, but I suppose: Python does memory management for you, it makes
doing IO management easy, but you need to manage the IO correctly. That
this segfaults and not just errors may be annoying, but seems the
nature of things on first sight.

- Sebastian



> Probably the solution isn't to make mmaps a special case, rather we
> should fix a bug somewhere in the use of the PEP3118 interface.
> 
> I've opened an issue on github for your issue:
> https://github.com/numpy/numpy/issues/9537
> 
> It seems to me that the "correct" behavior may be for it to me
> impossible to close the memmap while pointers to it exist; this is
> the
> behavior for `memoryview`s of mmaps. That is, your line `mm.close()`
> shoud raise an error `BufferError: cannot close exported pointers
> exist`.
> 
> 
> > Best wishes
> > Isaia
> > 
> > P.S. A longer account of the issue may be found on my university
> > blog
> > http://www.im.ufrj.br/nisoli/blog/?p=131
> > 
> > 
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170810/73334aed/attachment-0001.sig>


More information about the NumPy-Discussion mailing list