masked array with mask an instance of numpy.memmap
I've been using numpy.memmap to access large arrays very nicely, but now I desire to instantiate a masked array that uses memmap objects. Using a memmap instance for the data works just fine and as expected. However, using a memmap for the mask value seems to load the entire object into memory, even if it is already of type bool. How hard would it be to change that behavior? Also, it seems that median() is not implemented on masked arrays? Is there some method for a masked array like ravel() but that only includes the non-masked values? Thank you, Glen Mabey
On Tuesday 27 March 2007 17:42:18 Glen W. Mabey wrote:
However, using a memmap for the mask value seems to load the entire object into memory, even if it is already of type bool.
How hard would it be to change that behavior?
Oh, I have no idea about that...
Also, it seems that median() is not implemented on masked arrays?
Great minds meet, I just uploaded some very basic stats functions for the new implementation of masked array in the scipy SVN.
Is there some method for a masked array like ravel() but that only includes the non-masked values?
Yep, compressed(). Note that it will flatten your array first, so if you work with nd arrays, that's not the best solution. If you work with 2d arrays, you can simply loop on the other dimension, such as (n,p) = data.shape med = numpy.empty((n,),dtype=data.dtype) for i in range(p): med[i] = numpy.median(data[:,i].compressed()) That works well if p < n/3.
On Tue, Mar 27, 2007 at 05:54:05PM -0400, Pierre GM wrote:
On Tuesday 27 March 2007 17:42:18 Glen W. Mabey wrote:
However, using a memmap for the mask value seems to load the entire object into memory, even if it is already of type bool.
How hard would it be to change that behavior?
Oh, I have no idea about that...
Anyone else care to comment on this?
Also, it seems that median() is not implemented on masked arrays?
Great minds meet, I just uploaded some very basic stats functions for the new implementation of masked array in the scipy SVN.
Whoa. You didn't mean numpy SVN, I take it. So, will numpy.median not ever take into account the mask? And, how recently did you upload them? I updated scipy today and I still get: In [41]:scipy.stats.median( ma_data[100:,1] ) Out[41]:1989792384.0 In [42]:numpy.median( ma_data[100:,1].filled() ) Out[42]:1989792384.0 In [43]:ma_data[100:,1] Out[43]: array(data = [ 8829714. -1000000. 7007859. ..., 3921109.5 3815157. 4447688. ], mask = [False True False ..., False False False], fill_value=-1000000.0)
Is there some method for a masked array like ravel() but that only includes the non-masked values?
Yep, compressed().
Perfect. Thank you. -- Glen W. Mabey "Happiness in marriage and parenthood can exceed a thousand times any other happiness." -- James Esdras Faust
Whoa. You didn't mean numpy SVN, I take it. So, will numpy.median not ever take into account the mask?
Mmh, that should be part of numpy.core.ma instead: let's keep the masked arrays a bit on the side.
And, how recently did you upload them? I updated scipy today and I still get:
Ah, sorry, I should have been clearer: scipy SVN, sandbox/maskedarray/mstats.py is the file I was mentioning. That's more of a place holder than an actual library at this point, I'll update it regularly when the need arises. I can't start messing with scipy.stats to take masked array into accounts , as I'm far too biased towards maskedarray vs numpy.core.ma. I'm not familiar with memmap, but the fact that numpy.core.ma masked arrays are not ndarrays may be a hint.
On Tue, Mar 27, 2007 at 06:35:51PM -0400, Pierre GM wrote:
Whoa. You didn't mean numpy SVN, I take it. So, will numpy.median not ever take into account the mask?
Mmh, that should be part of numpy.core.ma instead: let's keep the masked arrays a bit on the side.
And, how recently did you upload them? I updated scipy today and I still get:
Ah, sorry, I should have been clearer: scipy SVN, sandbox/maskedarray/mstats.py is the file I was mentioning. That's more of a place holder than an actual library at this point, I'll update it regularly when the need arises. I can't start messing with scipy.stats to take masked array into accounts , as I'm far too biased towards maskedarray vs numpy.core.ma.
Uh, forgive me but I'm gleaning from your comments that numpy.ma is not the same as a maskedarray in scipy? Okay, yup, that's clarified in sandbox/maskedarray/README . Is there something deficient about numpy.core.ma? Thanks, Glen
Is there something deficient about numpy.core.ma?
numpy.core.ma objects are not ndarrays. Therefore, their mask disappear when used as the input of something like numpy.array(data, subok=True), which is quickly problematic. Moreover, they are not especially subclassing-friendly, which was my biggest bone, and the main reason why I started rewriting the implementation.
participants (2)
-
Glen W. Mabey
-
Pierre GM