[Numpy-discussion] np.array, copy=False and memmap

Allan Haldane allanhaldane at gmail.com
Thu Aug 10 15:56:41 EDT 2017


On 08/10/2017 02:24 PM, Sebastian Berg wrote:
> On Thu, 2017-08-10 at 12:27 -0400, Allan Haldane wrote:
>> On 08/07/2017 05:01 PM, Nisoli Isaia wrote:
>>> Dear all,
>>> I have a question about the behaviour of
>>>
>>> y = np.array(x, copy=False, dtype='float32')
>>>
>>> when x is a memmap. If we check the memmap attribute of mmap
>>>
>>> print "mmap attribute", y._mmap
>>>
>>> numpy tells us that y is not a memmap.
>>> But the following code snippet crashes the python interpreter
>>>
>>> # opens the memmap
>>> with open(filename,'r+b') as f:
>>>       mm = mmap.mmap(f.fileno(),0)
>>>       x = np.frombuffer(mm, dtype='float32')
>>>
>>> # builds an array from the memmap, with the option copy=False
>>> y = np.array(x, copy=False, dtype='float32')
>>> print "before", y
>>>
>>> # closes the file
>>> mm.close()
>>> print "after", y
>>>
>>> In my code I use memmaps to share read-only objects when doing
>>> parallel
>>> processing
>>> and the behaviour of np.array, even if not consistent, it's
>>> desirable.
>>> I share scipy sparse matrices over many processes and if np.array
>>> would
>>> make a copy
>>> when dealing with memmaps this would force me to rewrite part of
>>> the sparse
>>> matrices
>>> code.
>>> Would it be possible in the future releases of numpy to have
>>> np.array
>>> check,
>>> if copy is false, if y is a memmap and in that case return a full
>>> memmap
>>> object
>>> instead of slicing it?
>>
>> This does appear to be a bug in numpy or mmap.
>>
> 
> Frankly on first sight, I do not think it is a bug in either of them.
> Numpy uses view (memmap really is just a name for a memory map backed
> numpy array). The numpy array will hold a reference to the memory map
> object in its `.base` attribute (or the base of the base, etc.).
> 
> If you close a mmap object, and then keep using it, you can get
> segfaults of course, I am not sure what you can do about it. Maybe
> python can try to warn you when you exit the context/close a file
> pointer, but I suppose: Python does memory management for you, it makes
> doing IO management easy, but you need to manage the IO correctly. That
> this segfaults and not just errors may be annoying, but seems the
> nature of things on first sight.
> 
> - Sebastian

I admit I have not had time to investigate it thoroughly, but it appears
to me that the intended design of mmap was to make it impossible to
close a mmap if there were still pointers to it.

Consider the following behavior (python3):

    >>> import mmap
    >>> with open('test', 'r+b') as f:
    >>>     mm = mmap.mmap(f.fileno(),0)
    >>> mv = memoryview(mm)
    >>> mm.close()
    BufferError: cannot close exported pointers exist

If memoryview behaves this way, why doesn't/can't ndarray? (Both use the
PEP3118 interface, as far as I understand).

You can see in the mmap code that it tries to carefully keep track of
any exported buffers, but numpy manages to bypass this:
https://github.com/python/cpython/blob/b879fe82e7e5c3f7673c9a7fa4aad42bd05445d8/Modules/mmapmodule.c#L727



Allan


More information about the NumPy-Discussion mailing list