.transpose() of memmap array fails to close()
Hello, When assigning a variable that is the transpose() of a memmap array, the ._mmap member doesn't get copied, I guess: In [1]:import numpy In [2]:amemmap = numpy.memmap( '/tmp/afile', dtype=numpy.float32, shape=(4,5), mode='w+' ) In [3]:bmemmap = amemmap.transpose() In [4]:bmemmap.close() --------------------------------------------------------------------------- <type 'exceptions.AttributeError'> Traceback (most recent call last) /home/gmabey/src/R9619_dev_acqlibweb/Projects/R9619_NChannelDetection/NED/<ipython console> in <module>() /usr/local/stow/numpy-20070605_svn-py2.5/lib/python2.5/site-packages/numpy/core/memmap.py in close(self) 86 87 def close(self): ---> 88 self._mmap.close() 89 90 def __del__(self): <type 'exceptions.AttributeError'>: 'NoneType' object has no attribute 'close'
/usr/local/stow/numpy-20070605_svn-py2.5/lib/python2.5/site-packages/numpy/core/memmap.py(88)close() 87 def close(self): ---> 88 self._mmap.close() 89
This is an issue when the data is accessed in an order that is different from how it is stored on disk, as: bmemmap = numpy.memmap( '/tmp/afile', dtype=numpy.float32, shape=(4,5), mode='w+' ).transpose() So the object that was originally produced not accessible. I imagine there is some better way to indicate order of dimensions, but regardless, doing In [4]:bmemmap._mmap = amemmap._mmap is a hack workaround. Best regards, Glen Mabey
Hello, I posted this a while back and didn't get any replies. I'm running in to this issue again from a different aspect, and today I've been trying to figure out which method of ndarray needs to be overloaded for memmap so that the the ._mmap attribute gets handled appropriately. But, I have not been able to figure out what methods of ndarray are getting used in code such as this:
import numpy amemmap = numpy.memmap( '/tmp/afile', dtype=numpy.float32, shape=(4,5), mode='w+' ) b = amemmap[2:3] b Exception exceptions.AttributeError: "'memmap' object has no attribute '_mmap'" in <bound method memmap.__del__ of memmap([ 0., 0., 0., 0., 0.], dtype=float32)> ignored memmap([[ 0., 0., 0., 0., 0.]], dtype=float32)
Furthermore, can anyone enlighten me as to why an AttributeError exception would be ignored? Am I using numpy.memmap instances appropriately? Thank you, Glen Mabey On Thu, Jun 07, 2007 at 04:46:20PM -0500, Glen W. Mabey wrote:
Hello,
When assigning a variable that is the transpose() of a memmap array, the ._mmap member doesn't get copied, I guess:
In [1]:import numpy
In [2]:amemmap = numpy.memmap( '/tmp/afile', dtype=numpy.float32, shape=(4,5), mode='w+' )
In [3]:bmemmap = amemmap.transpose()
In [4]:bmemmap.close() --------------------------------------------------------------------------- <type 'exceptions.AttributeError'> Traceback (most recent call last)
/home/gmabey/src/R9619_dev_acqlibweb/Projects/R9619_NChannelDetection/NED/<ipython console> in <module>()
/usr/local/stow/numpy-20070605_svn-py2.5/lib/python2.5/site-packages/numpy/core/memmap.py in close(self) 86 87 def close(self): ---> 88 self._mmap.close() 89 90 def __del__(self):
<type 'exceptions.AttributeError'>: 'NoneType' object has no attribute 'close'
/usr/local/stow/numpy-20070605_svn-py2.5/lib/python2.5/site-packages/numpy/core/memmap.py(88)close() 87 def close(self): ---> 88 self._mmap.close() 89
This is an issue when the data is accessed in an order that is different from how it is stored on disk, as:
bmemmap = numpy.memmap( '/tmp/afile', dtype=numpy.float32, shape=(4,5), mode='w+' ).transpose()
So the object that was originally produced not accessible. I imagine there is some better way to indicate order of dimensions, but regardless, doing
In [4]:bmemmap._mmap = amemmap._mmap
is a hack workaround.
Best regards, Glen Mabey _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
On Fri, Aug 10, 2007 at 11:20:16AM -0500, Glen W. Mabey wrote:
I posted this a while back and didn't get any replies. I'm running in to this issue again from a different aspect, and today I've been trying to figure out which method of ndarray needs to be overloaded for memmap so that the the ._mmap attribute gets handled appropriately.
Oh, and Python 2.5.1 numpy svn as of yesterday ... AMD opteron, Linux/Debian Glen
[I keep posting hoping that someone knowledgeable in these things will take notice ...] Just a couple of more notes regarding this numpy.memmap issue. It seems that any slice of a numpy.memmap that is greater than 1-d has a similar problem. In [1]:import numpy In [2]:amemmap = numpy.memmap( '/tmp/afile', dtype=numpy.float32, shape=(4,5), mode='w+' ) In [3]:amemmap[1,3:4] Out[3]:memmap([ 0.], dtype=float32) In [4]:amemmap[0:1,3:4] Exception exceptions.AttributeError: "'memmap' object has no attribute '_mmap'" in <bound method memmap.__del__ of memmap([ 0.], dtype=float32)> ignored Out[4]:memmap([[ 0.]], dtype=float32) A very naive hack-fix of overloading the __getitem__ method of the numpy.memmap class such that the result of ndarray.__getitem__ gets the ._mmap attribute added didn't work ... I tried to follow the program flow into the bowels of multiarraymodule.c, but that was beyond me ... This problem started showing up when I changed to python 2.5 and persists in 2.5.1. I've considered switching back to 2.4 but I really need 64-bit array indexing ... Best Regards, Glen Mabey On Fri, Aug 10, 2007 at 11:20:16AM -0500, Glen W. Mabey wrote:
Hello,
I posted this a while back and didn't get any replies. I'm running in to this issue again from a different aspect, and today I've been trying to figure out which method of ndarray needs to be overloaded for memmap so that the the ._mmap attribute gets handled appropriately.
But, I have not been able to figure out what methods of ndarray are getting used in code such as this:
import numpy amemmap = numpy.memmap( '/tmp/afile', dtype=numpy.float32, shape=(4,5), mode='w+' ) b = amemmap[2:3] b Exception exceptions.AttributeError: "'memmap' object has no attribute '_mmap'" in <bound method memmap.__del__ of memmap([ 0., 0., 0., 0., 0.], dtype=float32)> ignored memmap([[ 0., 0., 0., 0., 0.]], dtype=float32)
Furthermore, can anyone enlighten me as to why an AttributeError exception would be ignored?
Am I using numpy.memmap instances appropriately?
Thank you, Glen Mabey
On Thu, Jun 07, 2007 at 04:46:20PM -0500, Glen W. Mabey wrote:
Hello,
When assigning a variable that is the transpose() of a memmap array, the ._mmap member doesn't get copied, I guess:
In [1]:import numpy
In [2]:amemmap = numpy.memmap( '/tmp/afile', dtype=numpy.float32, shape=(4,5), mode='w+' )
In [3]:bmemmap = amemmap.transpose()
In [4]:bmemmap.close() --------------------------------------------------------------------------- <type 'exceptions.AttributeError'> Traceback (most recent call last)
/home/gmabey/src/R9619_dev_acqlibweb/Projects/R9619_NChannelDetection/NED/<ipython console> in <module>()
/usr/local/stow/numpy-20070605_svn-py2.5/lib/python2.5/site-packages/numpy/core/memmap.py in close(self) 86 87 def close(self): ---> 88 self._mmap.close() 89 90 def __del__(self):
<type 'exceptions.AttributeError'>: 'NoneType' object has no attribute 'close'
/usr/local/stow/numpy-20070605_svn-py2.5/lib/python2.5/site-packages/numpy/core/memmap.py(88)close() 87 def close(self): ---> 88 self._mmap.close() 89
This is an issue when the data is accessed in an order that is different from how it is stored on disk, as:
bmemmap = numpy.memmap( '/tmp/afile', dtype=numpy.float32, shape=(4,5), mode='w+' ).transpose()
So the object that was originally produced not accessible. I imagine there is some better way to indicate order of dimensions, but regardless, doing
In [4]:bmemmap._mmap = amemmap._mmap
is a hack workaround.
Best regards, Glen Mabey _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
On Fri, 10 Aug 2007, "Glen W. Mabey" apparently wrote:
It seems that any slice of a numpy.memmap that is greater than 1-d has a similar problem. In [1]:import numpy In [2]:amemmap = numpy.memmap( '/tmp/afile', dtype=numpy.float32, shape=(4,5), mode='w+' ) In [3]:amemmap[1,3:4] Out[3]:memmap([ 0.], dtype=float32) In [4]:amemmap[0:1,3:4] Exception exceptions.AttributeError: "'memmap' object has no attribute '_mmap'" in <bound method memmap.__del__ of memmap([ 0.], dtype=float32)> ignored Out[4]:memmap([[ 0.]], dtype=float32)
You have not heard from anyone on this yet, right? Please continue to post your findings. Cheers, Alan Isaac
On Mon, Aug 13, 2007 at 11:51:49AM -0400, Alan G Isaac wrote:
You have not heard from anyone on this yet, right?
Nope, but I'm glad to hear even this response.
Please continue to post your findings.
At this point, I'm guessing that the __getitem__() method of ndarray returns a numpy.memmap instance instead of a ndarray instance, but that numpy.memmap.__new__() is not getting executed, resulting in ._mmap not getting initialized, so that when numpy.memmap.__del__() gets called, it chokes because ._mmap doesn't exist. For my purposes, I am mostly opening these files read-only, so I don't need to have flush() called. For the returned valued of __getitem__, it is not appropriate to have ._mmap.close() called (the other operation in numpy.memmap.__del__(). So, I just commented out the __del__() overloaded function. When I do open memmap'ed files read-write, I can manually perform a flush() operation before I'm done, and things seem to work out okay even though .close() isn't called. As I have tried to think through what should be the appropriate behavior for the returned value of __getitem__, I have not been able to see an appropriate solution (let alone know how to implement it) to this issue. Thank you, Glen Mabey
On 13/08/07, Glen W. Mabey <Glen.Mabey@swri.org> wrote:
As I have tried to think through what should be the appropriate behavior for the returned value of __getitem__, I have not been able to see an appropriate solution (let alone know how to implement it) to this issue.
Is the problem one of finalization? That is, making sure the memory map gets (flushed and) closed exactly once? In this case the numpythonic solution is to have only the original mmap object do any finalization; any slices contain a reference to it anyway, so they cannot be kept after it is collected. If the problem is that you want to do an explicit close/flush on a slice object, you could just always apply the close/flush to the base object of the slice if it has one or the slice itself if it doesn't. I'm afraid I don't really understand the problem but it seems like nobody who just knows the answer is about to speak up... Anne
On Tue, Aug 14, 2007 at 12:23:26AM -0400, Anne Archibald wrote:
On 13/08/07, Glen W. Mabey <Glen.Mabey@swri.org> wrote:
As I have tried to think through what should be the appropriate behavior for the returned value of __getitem__, I have not been able to see an appropriate solution (let alone know how to implement it) to this issue.
Is the problem one of finalization? That is, making sure the memory map gets (flushed and) closed exactly once? In this case the numpythonic solution is to have only the original mmap object do any finalization; any slices contain a reference to it anyway, so they cannot be kept after it is collected. If the problem is that you want to do an explicit close/flush on a slice object, you could just always apply the close/flush to the base object of the slice if it has one or the slice itself if it doesn't.
The immediate problem is that when a numpy.memmap instance is created as another view of the original array, then __del__ on that new view fails. flush()ing and closing aren't an issue for me, but they can't be performed at all on derived views right now. It seems to me that any derived view ought to be able to flush(), and ideally in my mind, close() would be called [automatically] only just before the reference count gets decremented to zero. That doesn't seem to match the numpythonic philosophy you described, Anne, but seems logical to me, while still allowing for both manual flush() and close() operations. Thanks for your response. Glen
Hi, Thanks for looking into this because we (neuroimaging.scipy.org) use mmaps a lot. I am very away from my desk at the moment but please do keep us all informed, and we'll try and pitch in if we can... Matthew On 8/15/07, Glen W. Mabey <Glen.Mabey@swri.org> wrote:
On Tue, Aug 14, 2007 at 12:23:26AM -0400, Anne Archibald wrote:
On 13/08/07, Glen W. Mabey <Glen.Mabey@swri.org> wrote:
As I have tried to think through what should be the appropriate behavior for the returned value of __getitem__, I have not been able to see an appropriate solution (let alone know how to implement it) to this issue.
Is the problem one of finalization? That is, making sure the memory map gets (flushed and) closed exactly once? In this case the numpythonic solution is to have only the original mmap object do any finalization; any slices contain a reference to it anyway, so they cannot be kept after it is collected. If the problem is that you want to do an explicit close/flush on a slice object, you could just always apply the close/flush to the base object of the slice if it has one or the slice itself if it doesn't.
The immediate problem is that when a numpy.memmap instance is created as another view of the original array, then __del__ on that new view fails.
flush()ing and closing aren't an issue for me, but they can't be performed at all on derived views right now. It seems to me that any derived view ought to be able to flush(), and ideally in my mind, close() would be called [automatically] only just before the reference count gets decremented to zero.
That doesn't seem to match the numpythonic philosophy you described, Anne, but seems logical to me, while still allowing for both manual flush() and close() operations.
Thanks for your response.
Glen _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
On 15/08/07, Glen W. Mabey <Glen.Mabey@swri.org> wrote:
On Tue, Aug 14, 2007 at 12:23:26AM -0400, Anne Archibald wrote:
On 13/08/07, Glen W. Mabey <Glen.Mabey@swri.org> wrote:
As I have tried to think through what should be the appropriate behavior for the returned value of __getitem__, I have not been able to see an appropriate solution (let alone know how to implement it) to this issue.
Is the problem one of finalization? That is, making sure the memory map gets (flushed and) closed exactly once? In this case the numpythonic solution is to have only the original mmap object do any finalization; any slices contain a reference to it anyway, so they cannot be kept after it is collected. If the problem is that you want to do an explicit close/flush on a slice object, you could just always apply the close/flush to the base object of the slice if it has one or the slice itself if it doesn't.
The immediate problem is that when a numpy.memmap instance is created as another view of the original array, then __del__ on that new view fails.
Yes, this is definitely broken.
flush()ing and closing aren't an issue for me, but they can't be performed at all on derived views right now. It seems to me that any derived view ought to be able to flush(), and ideally in my mind, close() would be called [automatically] only just before the reference count gets decremented to zero.
That doesn't seem to match the numpythonic philosophy you described, Anne, but seems logical to me, while still allowing for both manual flush() and close() operations.
You have to be a bit careful, because a view really is just a view into the array - the original is still around. So you can't really delete the array contents when the view is deleted. Really, if you do: B = A[::2] del B nothing at all should happen to A. But to be pythonic, or numpythonic, when the original A is garbage-collected, the garbage collection should certainly close the mmap. Being able to apply flush() or whatever to slices is not necessarily unpythonic, but it's probably a lot simpler to reliably implement slices of mmap()s as simple slices of ordinary arrays. It means you need to keep the original mmap object around (or traverse up the tree of bases: T = A while T.base is not None: T = T.base T.flush() ) (Note that this would be simpler if when you did A = arange(100) B = A[::2] C = B[::2] you found that C.base were A rather than B.) Anne
On Wed, Aug 15, 2007 at 08:50:28PM -0400, Anne Archibald wrote:
You have to be a bit careful, because a view really is just a view into the array - the original is still around. So you can't really delete the array contents when the view is deleted. Really, if you do: B = A[::2] del B nothing at all should happen to A.
Okay, right. I was muddling those two concepts.
But to be pythonic, or numpythonic, when the original A is garbage-collected, the garbage collection should certainly close the mmap.
Humm, this would be less than ideal for my use case, when the data on disk is organized in a different dimensional order than I want to refer to it in my code. For example: p_data = numpy.memmap( datafilename, shape=( 10, 1024, 20 ), dtype=numpy.float32, mode='r') u_data = p_data.transpose( [ 2, 0, 1 ] ) and I don't want to have to keep track of p_data because its only u_data that I care about and want to use. And I promise, this is not a contrived example. I have data that I really do want to be ordered in a certain way on disk, for I/O efficiency reasons, yet when I logically index into it in my code, I want the dimensions to be in a different order.
Being able to apply flush() or whatever to slices is not necessarily unpythonic, but it's probably a lot simpler to reliably implement slices of mmap()s as simple slices of ordinary arrays.
I considered this approach, but what happens if you want to instantiate a slice that is very large, e.g., larger than the size of your physical RAM? In that case, you can't afford to make simple slices be ordinary arrays, besides the case where you want to change values. Making them functionally memmap-arrays, but without .sync() and .close() doesn't seem right either.
It means you need to keep the original mmap object around (or traverse up the tree of bases: T = A while T.base is not None: T = T.base T.flush() )
(Note that this would be simpler if when you did A = arange(100) B = A[::2] C = B[::2] you found that C.base were A rather than B.)
Okay, this would make it so that I didn't have to explicitly keep track of p_data, in my example. Not bad, although I'd never noticed a .base member before ... Thank you, Glen Mabey
On 16/08/07, Glen W. Mabey <Glen.Mabey@swri.org> wrote:
On Wed, Aug 15, 2007 at 08:50:28PM -0400, Anne Archibald wrote:
But to be pythonic, or numpythonic, when the original A is garbage-collected, the garbage collection should certainly close the mmap.
Humm, this would be less than ideal for my use case, when the data on disk is organized in a different dimensional order than I want to refer to it in my code. For example:
p_data = numpy.memmap( datafilename, shape=( 10, 1024, 20 ), dtype=numpy.float32, mode='r') u_data = p_data.transpose( [ 2, 0, 1 ] )
and I don't want to have to keep track of p_data because its only u_data that I care about and want to use. And I promise, this is not a contrived example. I have data that I really do want to be ordered in a certain way on disk, for I/O efficiency reasons, yet when I logically index into it in my code, I want the dimensions to be in a different order.
Perfectly reasonable. Note that p_data cannot be collected until u_data goes away too, so the mmap is safe. And transpose()ing doesn't copy any data, so even if you get an ndarray, you haven't lost the ability to modify things on disk.
Being able to apply flush() or whatever to slices is not necessarily unpythonic, but it's probably a lot simpler to reliably implement slices of mmap()s as simple slices of ordinary arrays.
I considered this approach, but what happens if you want to instantiate a slice that is very large, e.g., larger than the size of your physical RAM? In that case, you can't afford to make simple slices be ordinary arrays, besides the case where you want to change values. Making them functionally memmap-arrays, but without .sync() and .close() doesn't seem right either.
I was a bit ambiguous. An ordinary numpy array is an ndarray object, which contains some housekeeping data (dimension, shape, stride lengths, some flags, what have you) and a pointer to a hunk of memory. That hunk of memory can be pretty much any directly-addressable memory, for example a contiguous block of malloc()ed RAM, the beginning of a (possibly strided) subblock of an existing piece of malloc()ed RAM, a pointer to an array statically allocated in some C or Fortran library... or a piece of memory in an mmap()ed region. Numpy doesn't care at all about the difference. In fact this is the beauty of numpy: because all it cares about is where the elements start, what they look like, how many there are, and how far apart they are, it can cheaply create subarrays without copying any data. So naively, one might implement mmap()ed arrays with a factory function that called mmap(), got back a pointer to the place in virtual memory where the file's contents appear to live, and whipped up a perfectly ordinary ndarray to point to the contents. It would work, thanks to the magic of the OS's mmap() call. The only problem is you would have to figure out when it was safe to close the mmap() (invalidating the array's memory!) and you would have no convenient way to flush() the mmap() out to disk. So the mmap() objects exist. All they are is ndarrays that keep track of how the mmap() was done and provide flush() and close() methods; they also (I hope!) make sure close() gets called when they get garbage-collected. Note that the safety-scissors way to do this would be to *not* provide a close() method, since a close() leaves the object's data unusable, just waiting for an unwise attempt to index into the object. It's probably better not to ever close() an mmap() object. What should happen when you take a slice of an mmap() object? (this includes transposes and other non-copying ways to get at its contents). You get a fresh new ndarray object that does all the numpy magic. But should it also do the mmap() magic? It doesn't need the mmap() creation magic, since the mmap() already exists. flush() would be sort of nice, since that's meaningful (though it might take a long time, if it flushes the whole mmap). close() is just asking to shoot yourself in the foot, since it not only invalidates the slice you took but the whole mmap()! It seems to me - remember, I don't use mmap or develop numpy, so give this opinion the corresponding weight - that the Right Answer for mmap() is to provide flush(), but not to provide close() except on finalization (you can ensure finalization happens by deleting all references to the array). Finally, if you take a slice of an mmap(), I think you should get a simple ndarray. This ensures you don't have to thread type-duplication code into everywhere that might make a slice. But if you do make slices themselves mmap()s, providing flush() to slices too, great. Just don't provide close(), and particularly *don't* invoke it on finalization of slices, or things will die horribly. Anne
participants (4)
-
Alan G Isaac
-
Anne Archibald
-
Glen W. Mabey
-
Matthew Brett