On Sat, Jan 21, 2012 at 12:49 PM, Benjamin Root <ben.root@ou.edu> wrote:

On Fri, Jan 20, 2012 at 10:21 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
Hi All,

I'd like some feedback on how mask NA should interact with views. The immediate problem is how to deal with the real and imaginary parts of complex numbers. If the original has a masked value, it should show up as masked in the real and imaginary parts. But what should happen on assignment to one of the masked views? This should probably clear the NA in the real/imag part, but not in the complex original.

That's a very sticky question.  If one were to clear the NA on both the real and imaginary parts, we run the risk of possibly exposing uninitialized data.  Remember, depending on how we finally decide how math is done with NA, creating a new array from some operations that had masks may not compute any value for those masked elements.  So, if we assign to the real part and, therefore, clear that mask, the imaginary part may just be random bits.

Conversely, if we were to keep the imaginary part masked, does that still make sense for mathematical operations?  Say, perhaps, magnitudes or fourier transforms?  Would it make sense to instead clear the mask on both real and imaginary parts and merely assume as assigning to the real part implicitly means a zero assignment to the imaginary part (and vice-versa).  Mathematically, this makes sense to me since it would be equivalent, but as a programmer, this thought makes me cringe. Consider making an assignment first to the real part, and then to the imaginary part, the second assignment would wipe out the first (if we want to be consistent).

Are there use cases for separately making assignments to the real and imaginary parts? Would we want the zero assignment to happen *only* if there was a mask, but not if there wasn't a mask?  This gets very icky, indeed.

 
However, that does allow touching things under the mask, so to speak.


Remember, some forms of missingness that we have discussed allows for "unmasking", while other forms do not.  However, currently, the NEP does not allow for touching things under the mask, IIRC.

 
Things get more complicated if the complex original is viewed as reals. In this case the mask needs to be "doubled" up, and there is again the possibility of touching things beneath the mask in the original. Viewing the original as bytes leads to even greater duplication.


Let's also think of it in the other direction. Let's say I have an array of 32-bit ints and I view them as 64-bit ints.  This is what currently happens:

>>> a = np.array([1, 2, 3, np.NA, 5, 6, 7, 8, 9, 10], dtype='i4')
>>> a.view('i8')
array([8589934593,           3, 25769803781, NA, 42949672969], dtype=int64)
>>> a = np.array([1, 2, np.NA, 4, 5, 6, 7, 8, 9, 10], dtype='i4')
>>> a.view('i8')
array([8589934593, 17179869206, NA, 34359738375, 42949672969], dtype=int64)

Depending on the position of the NA, the view may or may not get the NA.  I would imagine that this is also endian-dependent. I am not entirely certain of what the correct behavior should be, but I think the answer to this is also related to the answer to the real/imaginary case.
 
My thought is that touching the underlying data needs to be allowed in these cases, but the original mask can only be cleared by assignment to the original. Thoughts?


Such a restriction would likely prove problematic.  When we create functions and other libraries, we are not aware of whether we are dealing with a view of an array or the original.  Heck, most of the time, I am not paying attention to whether I am using a view or not in my own programs.  The transparency of views has been a major selling point to me for numpy.  Eventually, (my understanding is that) views will become completely indistinguishable from the original numpy array in all of the remaining corner cases (boolean assignments and such).

If we decide to make NA-related assignments different for views than originals, then it only increases the contrast between numpy arrays and views.  In a language like Python, this would likely be a bad thing.

Unfortunately, I am not sure of what should be the solution.  But I hope this spurs further discussion.


Note that in normal views the mask is also a view:

In [1]: a = ones(5, maskna=1)

In [2]: a[1] = NA

In [3]: a
Out[3]: array([ 1.,  NA,  1.,  1.,  1.])

In [4]: b = a[1::2]

In [5]: b
Out[5]: array([ NA,  1.])

In [6]: b[0] = 1

In [7]: b
Out[7]: array([ 1.,  1.], maskna=True)

In [8]: a
Out[8]: array([ 1.,  1.,  1.,  1.,  1.], maskna=True)

In [10]: a[1] = NA

In [11]: b = a.view(int64)

In [12]: b
Out[12]:
array([4607182418800017408, NA, 4607182418800017408, 4607182418800017408,
       4607182418800017408])

In [13]: b[1] = 0

In [14]: a
Out[14]: array([ 1.,  0.,  1.,  1.,  1.], maskna=True)

 Where the problems happen is when the item sizes don't match.

Chuck