[Numpy-discussion] Thoughts on masked arrays

Paul F. Dubois paul at pfdubois.com
Fri May 11 12:02:54 EDT 2001


Thank you for provoking me to think about these issues in MA. Here is the
conclusion I have reached. Please let me know what you think of it.

Background:

Michael wanted a way to use a masked array as a Numeric array but with
assurance that in fact no element was masked,
without obscure tests such as count(x) == product(x.shape).

The method __array__(self, typecode=None) is a special (existing) hook for
conversion to a Numeric array. Many operations in Numeric, when presented
with an object x to be operated upon, such as Numeric.sqrt(x), will call
x.__array__ as a final act of desperation in an attempt to convert their
argument to a Numeric array. Heretofore it was essentially returning
x.filled(). This bothered me, because it was a silent conversion that
replaced masked values with the fill value.

Solution:

a. Add a method 'unmask()' which will replace the mask by None if possible.
It will not fail.

b. Change MaskedArray.__array__ to work as follows:
   a. self.unmask(), and then
   b. Return the raw data if the mask is now None.
      Otherwise, throw an MAError.

Example usage:
>>> from MA import *
>>> x=arange(10)
>>> Numeric.array(x)
[0,1,2,3,4,5,6,7,8,9,]
>>> x[3]=masked
>>> Numeric.array(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/pcmdi/dubois/linux/lib/python2.1/site-packages/MA/MA.py", line 578,
in __array__
    raise MAError, \
MA.MA.MAError: Cannot convert masked array to Numeric because data
               is masked in one or more locations.

Merits of this solution:
   a. It reads like what it is -- there is no doubt you are converting to a
Numeric array when you see Numeric.array.

   b. It gives you the full range of options in Numeric.array, such as
messing with the typecode.

   c. It allows Numeric operations for speed on masked arrays that you know
to be masked in name only. No copy of data occurs here unless the typecode
needs to be changed.

   d. It removes the possibility of a 'dishonest' conversion.

   e. No new method or function is required, other than the otherwise-useful
unmask().

   f. Successive conversions are optimized because once the mask is None,
unmask is cheap.

Deficiency: __array__ becomes a query with an internal, albeit safe,
side-effect. Mitigating this is that __array__ is not a "public" method and
would not normally be used in assertions.







More information about the NumPy-Discussion mailing list