Re: [Numpy-discussion] Thoughts on masked arrays

"Paul F. Dubois" <paul@pfdubois.com> writes:
Shouldn't that be m.mask() is not None and Numeric.sometrue(Numeric.ravel(m.mask())) ? ("Proof" that these expressions are nonintuitive.)
Is that enough ways to do it? (TM) (:->
Frankly, it's too many ways to do it, none of them obvious to the writer or the reader. This is a simple and useful concept and it should have one obvious implementation.
In this case the side effect is to change the internal representation of the object without changing its semantics, so I don't find it too objectionable. But omit this optimization if you prefer; the query method would be just as useful even without the side effect. Because of the relationship with filled(), maybe this query function should be called m.isfull(). There should probably also be an isfull(m) function for the same reason that there is a mask(m) function.
A method that replaces the mask with None if possible might make sense. m.unmask()? m.demask()? m.debride() ?
Of these names, I like m.unmask() the best. I assume that it would set m.__mask=None if possible and throw an exception if not. On the other hand, while it would be desirable to have a function equivalent (i.e., unmask(m)), this would be awkward because a function should usually not change its argument. Therefore, I suggest adding a safe analogue of raw_data() that throws an exception if the array has a nontrivial mask and otherwise returns self.__data. E.g. [untested]: class MaskedArray: [...] def data(self): """If no values are masked, return self.__data(). Otherwise raise an exception. """ d = self.__data m = self.__mask if m is not None and Numeric.sometrue(Numeric.ravel(m)): raise MAError, "MaskedArray cannot be converted to array" elif d.iscontiguous(): return d else: return Numeric.array(d, typecode=d.typecode(), copy=1, savespace = d.spacesaver()) def data(a): if isinstance(a, MaskedArray): return m.data() elif isinstance(a, Numeric.ArrayType) and a.iscontiguous(): return a else: return Numeric.array(a) A more obscure name should be chosen since you seem to encourage "from MA import *".
In the simple case where the condition has no masked values, I think compress() should simply pick slices out according to condition, without regard to which cells of x are masked. When condition is masked, I don't think that there is a sensible interpretation for compress() because a "masked" value in condition means you don't know whether that slice of x should be included or not. Since you can't have an output array of indeterminate shape, I would throw an exception if condition is masked. Here is my attempt [untested]: def compress(condition, x, dimension=-1): # data function is defined above (throws exception if condition is masked): c = data(condition) if mask(x) is None: mask = None else: mask=Numeric.compress(condition, mask(x), dimension) return array(Numeric.compress(condition, filled(x), dimension), mask=mask) Yours, Michael -- Michael Haggerty mhagger@alum.mit.edu

Thank you for provoking me to think about these issues in MA. Here is the conclusion I have reached. Please let me know what you think of it. Background: Michael wanted a way to use a masked array as a Numeric array but with assurance that in fact no element was masked, without obscure tests such as count(x) == product(x.shape). The method __array__(self, typecode=None) is a special (existing) hook for conversion to a Numeric array. Many operations in Numeric, when presented with an object x to be operated upon, such as Numeric.sqrt(x), will call x.__array__ as a final act of desperation in an attempt to convert their argument to a Numeric array. Heretofore it was essentially returning x.filled(). This bothered me, because it was a silent conversion that replaced masked values with the fill value. Solution: a. Add a method 'unmask()' which will replace the mask by None if possible. It will not fail. b. Change MaskedArray.__array__ to work as follows: a. self.unmask(), and then b. Return the raw data if the mask is now None. Otherwise, throw an MAError. Example usage:
Merits of this solution: a. It reads like what it is -- there is no doubt you are converting to a Numeric array when you see Numeric.array. b. It gives you the full range of options in Numeric.array, such as messing with the typecode. c. It allows Numeric operations for speed on masked arrays that you know to be masked in name only. No copy of data occurs here unless the typecode needs to be changed. d. It removes the possibility of a 'dishonest' conversion. e. No new method or function is required, other than the otherwise-useful unmask(). f. Successive conversions are optimized because once the mask is None, unmask is cheap. Deficiency: __array__ becomes a query with an internal, albeit safe, side-effect. Mitigating this is that __array__ is not a "public" method and would not normally be used in assertions.

Thank you for provoking me to think about these issues in MA. Here is the conclusion I have reached. Please let me know what you think of it. Background: Michael wanted a way to use a masked array as a Numeric array but with assurance that in fact no element was masked, without obscure tests such as count(x) == product(x.shape). The method __array__(self, typecode=None) is a special (existing) hook for conversion to a Numeric array. Many operations in Numeric, when presented with an object x to be operated upon, such as Numeric.sqrt(x), will call x.__array__ as a final act of desperation in an attempt to convert their argument to a Numeric array. Heretofore it was essentially returning x.filled(). This bothered me, because it was a silent conversion that replaced masked values with the fill value. Solution: a. Add a method 'unmask()' which will replace the mask by None if possible. It will not fail. b. Change MaskedArray.__array__ to work as follows: a. self.unmask(), and then b. Return the raw data if the mask is now None. Otherwise, throw an MAError. Example usage:
Merits of this solution: a. It reads like what it is -- there is no doubt you are converting to a Numeric array when you see Numeric.array. b. It gives you the full range of options in Numeric.array, such as messing with the typecode. c. It allows Numeric operations for speed on masked arrays that you know to be masked in name only. No copy of data occurs here unless the typecode needs to be changed. d. It removes the possibility of a 'dishonest' conversion. e. No new method or function is required, other than the otherwise-useful unmask(). f. Successive conversions are optimized because once the mask is None, unmask is cheap. Deficiency: __array__ becomes a query with an internal, albeit safe, side-effect. Mitigating this is that __array__ is not a "public" method and would not normally be used in assertions.
participants (2)
-
Michael Haggerty
-
Paul F. Dubois