Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

June 30, 2011


      On Jun 30, 2011, at 5:38 PM, Matthew Brett wrote:
...
Hi,
On Thu, Jun 30, 2011 at 2:58 PM, Pierre GM <pgmdevlist@gmail.com> wrote:
...
On Jun 30, 2011, at 3:31 PM, Matthew Brett wrote:
...
###############################################
A alternative-NEP on masking and missing values
###############################################
I like the idea of two different special values, np.NA for missing values, np.IGNORE for masked values. np.NA values in an array define what was implemented in numpy.ma as a 'hard mask' (where you can't unmask data), while np.IGNOREs correspond to the .mask in numpy.ma. Looks fairly non ambiguous that way.
...
**************
Initialization
**************
First, missing values can be set and be displayed as ``np.NA, NA``::
...
...
...
np.array([1.0, 2.0, np.NA, 7.0], dtype='NA[f8]')
   array([1., 2., NA, 7.], dtype='NA[<f8]')
As the initialization is not ambiguous, this can be written without the NA
dtype::
...
...
...
np.array([1.0, 2.0, np.NA, 7.0])
   array([1., 2., NA, 7.], dtype='NA[<f8]')
Masked values can be set and be displayed as ``np.MASKED, MASKED``::
...
...
...
np.array([1.0, 2.0, np.MASKED, 7.0], masked=True)
   array([1., 2., MASKED, 7.], masked=True)
As the initialization is not ambiguous, this can be written without
``masked=True``::
...
...
...
np.array([1.0, 2.0, np.MASKED, 7.0])
   array([1., 2., MASKED, 7.], masked=True)
I'm not happy with this 'masked' parameter, at all. What's the point? Either you have np.NAs and/or np.IGNOREs or you don't. I'm probably missing something here.
If I put np.MASKED (I agree I prefer np.IGNORE) in the init, then
obviously I mean it should be masked, so the 'masked=True' here is
completely redundant, yes, I agree.  And in fact:
np.array([1.0, 2.0, np.MASKED, 7.0], masked=False)
should raise an error.  On the other hand, if I make a normal array:
arr = np.array([1.0, 2.0, 7.0])
and then do this:
arr.visible[2] = False
then either I should raise an error (it's not a masked array), or,
more magically, construct a mask on the fly.   This somewhat breaks
expectations though, because you might just have made a largish mask
array without having any clue that that had happened.
Well, I'd expect an error to be raised when assigning a NA if the initial array is not NA friendly. The 'magical' creation of a mask would be nice, but is probably too magic and best left alone.
...
...
...
Direct assignnent in the masked case is magic and confusing, and so happens only
via the mask::
...
...
...
masked_array = np.array([1.0, 2.0, 7.0], masked=True)
masked_arr[2] = np.NA
   TypeError('dtype does not support NA')
masked_arr[2] = np.MASKED
   TypeError('float() argument must be a string or a number')
masked_arr.visible[2] = False
masked_arr
   array([1., 2., MASKED], masked=True)
What about the reverse case ? When you assign a regular value to a np.NA/np.IGNORE item ?
Well, for the np.NA case, this is straightforward:
na_arr[2] = 3
It's just assignment. For ``masked_array[2] = 3`` - I don't know, I
guess whatever we are used to.  What do you think?
Ahah, that depends.
With a = np.array([1., np.NA, 3.]), then a[1]=2. should raise an error, as Mark suggests: you can't "unmask" a missing value, you need to create a view of the initial array then "unmask". It's the equivalent of a hard mask.
With a = np.array([1., np.IGNORE, 3.]), then a[1]=2. should give np.array([1.,2.,3.]) and a.mask=[False,False,False]. That's a soft mask.
At least, that's how I see it...
P.