Mailman 3 views and mask NA - SciPy-Dev

newer
installation of scipy on Mac OS X...

views and mask NA

Charles R Harris

Jan. 21, 2012

4:21 a.m.

Hi All, I'd like some feedback on how mask NA should interact with views. The immediate problem is how to deal with the real and imaginary parts of complex numbers. If the original has a masked value, it should show up as masked in the real and imaginary parts. But what should happen on assignment to one of the masked views? This should probably clear the NA in the real/imag part, but not in the complex original. However, that does allow touching things under the mask, so to speak. Things get more complicated if the complex original is viewed as reals. In this case the mask needs to be "doubled" up, and there is again the possibility of touching things beneath the mask in the original. Viewing the original as bytes leads to even greater duplication. My thought is that touching the underlying data needs to be allowed in these cases, but the original mask can only be cleared by assignment to the original. Thoughts? Chuck

Attachments:

attachment.htm (text/html — 981 bytes)

Show replies by date

Charles R Harris

January 2012

5:07 p.m.

Oops, wrong list... Chuck On Fri, Jan 20, 2012 at 9:21 PM, Charles R Harris <charlesr.harris@gmail.com

...

wrote:

...

Benjamin Root

7:49 p.m.

On Fri, Jan 20, 2012 at 10:21 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:

...

Hi All,

I'd like some feedback on how mask NA should interact with views. The immediate problem is how to deal with the real and imaginary parts of complex numbers. If the original has a masked value, it should show up as masked in the real and imaginary parts. But what should happen on assignment to one of the masked views? This should probably clear the NA in the real/imag part, but not in the complex original.

That's a very sticky question. If one were to clear the NA on both the real and imaginary parts, we run the risk of possibly exposing uninitialized data. Remember, depending on how we finally decide how math is done with NA, creating a new array from some operations that had masks may not compute any value for those masked elements. So, if we assign to the real part and, therefore, clear that mask, the imaginary part may just be random bits. Conversely, if we were to keep the imaginary part masked, does that still make sense for mathematical operations? Say, perhaps, magnitudes or fourier transforms? Would it make sense to instead clear the mask on both real and imaginary parts and merely assume as assigning to the real part implicitly means a zero assignment to the imaginary part (and vice-versa). Mathematically, this makes sense to me since it would be equivalent, but as a programmer, this thought makes me cringe. Consider making an assignment first to the real part, and then to the imaginary part, the second assignment would wipe out the first (if we want to be consistent). Are there use cases for separately making assignments to the real and imaginary parts? Would we want the zero assignment to happen *only* if there was a mask, but not if there wasn't a mask? This gets very icky, indeed.

...

However, that does allow touching things under the mask, so to speak.

Remember, some forms of missingness that we have discussed allows for "unmasking", while other forms do not. However, currently, the NEP does not allow for touching things under the mask, IIRC.

...

Things get more complicated if the complex original is viewed as reals. In this case the mask needs to be "doubled" up, and there is again the possibility of touching things beneath the mask in the original. Viewing the original as bytes leads to even greater duplication.

Let's also think of it in the other direction. Let's say I have an array of 32-bit ints and I view them as 64-bit ints. This is what currently happens:

...

...
...
a = np.array([1, 2, 3, np.NA, 5, 6, 7, 8, 9, 10], dtype='i4') a.view('i8') array([8589934593, 3, 25769803781, NA, 42949672969], dtype=int64) a = np.array([1, 2, np.NA, 4, 5, 6, 7, 8, 9, 10], dtype='i4') a.view('i8') array([8589934593, 17179869206, NA, 34359738375, 42949672969], dtype=int64)

Depending on the position of the NA, the view may or may not get the NA. I would imagine that this is also endian-dependent. I am not entirely certain of what the correct behavior should be, but I think the answer to this is also related to the answer to the real/imaginary case.

...

My thought is that touching the underlying data needs to be allowed in these cases, but the original mask can only be cleared by assignment to the original. Thoughts?

Such a restriction would likely prove problematic. When we create functions and other libraries, we are not aware of whether we are dealing with a view of an array or the original. Heck, most of the time, I am not paying attention to whether I am using a view or not in my own programs. The transparency of views has been a major selling point to me for numpy. Eventually, (my understanding is that) views will become completely indistinguishable from the original numpy array in all of the remaining corner cases (boolean assignments and such). If we decide to make NA-related assignments different for views than originals, then it only increases the contrast between numpy arrays and views. In a language like Python, this would likely be a bad thing. Unfortunately, I am not sure of what should be the solution. But I hope this spurs further discussion. Cheers, Ben Root

Charles R Harris

8:16 p.m.

On Sat, Jan 21, 2012 at 12:49 PM, Benjamin Root <ben.root@ou.edu> wrote:

...

On Fri, Jan 20, 2012 at 10:21 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:

...
Hi All,

I'd like some feedback on how mask NA should interact with views. The immediate problem is how to deal with the real and imaginary parts of complex numbers. If the original has a masked value, it should show up as masked in the real and imaginary parts. But what should happen on assignment to one of the masked views? This should probably clear the NA in the real/imag part, but not in the complex original.

That's a very sticky question. If one were to clear the NA on both the real and imaginary parts, we run the risk of possibly exposing uninitialized data. Remember, depending on how we finally decide how math is done with NA, creating a new array from some operations that had masks may not compute any value for those masked elements. So, if we assign to the real part and, therefore, clear that mask, the imaginary part may just be random bits.

Conversely, if we were to keep the imaginary part masked, does that still make sense for mathematical operations? Say, perhaps, magnitudes or fourier transforms? Would it make sense to instead clear the mask on both real and imaginary parts and merely assume as assigning to the real part implicitly means a zero assignment to the imaginary part (and vice-versa). Mathematically, this makes sense to me since it would be equivalent, but as a programmer, this thought makes me cringe. Consider making an assignment first to the real part, and then to the imaginary part, the second assignment would wipe out the first (if we want to be consistent).

Are there use cases for separately making assignments to the real and imaginary parts? Would we want the zero assignment to happen *only* if there was a mask, but not if there wasn't a mask? This gets very icky, indeed.

...
However, that does allow touching things under the mask, so to speak.

Remember, some forms of missingness that we have discussed allows for "unmasking", while other forms do not. However, currently, the NEP does not allow for touching things under the mask, IIRC.

...
Things get more complicated if the complex original is viewed as reals. In this case the mask needs to be "doubled" up, and there is again the possibility of touching things beneath the mask in the original. Viewing the original as bytes leads to even greater duplication.

Let's also think of it in the other direction. Let's say I have an array of 32-bit ints and I view them as 64-bit ints. This is what currently happens:

...
...
...
a = np.array([1, 2, 3, np.NA, 5, 6, 7, 8, 9, 10], dtype='i4') a.view('i8') array([8589934593, 3, 25769803781, NA, 42949672969], dtype=int64) a = np.array([1, 2, np.NA, 4, 5, 6, 7, 8, 9, 10], dtype='i4') a.view('i8') array([8589934593, 17179869206, NA, 34359738375, 42949672969], dtype=int64)

Depending on the position of the NA, the view may or may not get the NA. I would imagine that this is also endian-dependent. I am not entirely certain of what the correct behavior should be, but I think the answer to this is also related to the answer to the real/imaginary case.

...
My thought is that touching the underlying data needs to be allowed in these cases, but the original mask can only be cleared by assignment to the original. Thoughts?

Such a restriction would likely prove problematic. When we create functions and other libraries, we are not aware of whether we are dealing with a view of an array or the original. Heck, most of the time, I am not paying attention to whether I am using a view or not in my own programs. The transparency of views has been a major selling point to me for numpy. Eventually, (my understanding is that) views will become completely indistinguishable from the original numpy array in all of the remaining corner cases (boolean assignments and such).

If we decide to make NA-related assignments different for views than originals, then it only increases the contrast between numpy arrays and views. In a language like Python, this would likely be a bad thing.

Unfortunately, I am not sure of what should be the solution. But I hope this spurs further discussion.

Note that in normal views the mask is also a view: In [1]: a = ones(5, maskna=1) In [2]: a[1] = NA In [3]: a Out[3]: array([ 1., NA, 1., 1., 1.]) In [4]: b = a[1::2] In [5]: b Out[5]: array([ NA, 1.]) In [6]: b[0] = 1 In [7]: b Out[7]: array([ 1., 1.], maskna=True) In [8]: a Out[8]: array([ 1., 1., 1., 1., 1.], maskna=True) In [10]: a[1] = NA In [11]: b = a.view(int64) In [12]: b Out[12]: array([4607182418800017408, NA, 4607182418800017408, 4607182418800017408, 4607182418800017408]) In [13]: b[1] = 0 In [14]: a Out[14]: array([ 1., 0., 1., 1., 1.], maskna=True) Where the problems happen is when the item sizes don't match. Chuck

Charles R Harris

8:25 p.m.

Benjamin, Offtopic, but I was going to look at your gradient function pull request today. Do you have the time to work on it at the moment? Otherwise I'll need to add the tests myself ;) Chuck

Benjamin Root

10:08 p.m.

On Sat, Jan 21, 2012 at 2:25 PM, Charles R Harris <charlesr.harris@gmail.com

...

wrote:

...

I had completely forgotten about that. I can take a look and make some test data for you, but I have no clue where it goes. Ben Root

Charles R Harris

10:42 p.m.

On Sat, Jan 21, 2012 at 3:08 PM, Benjamin Root <ben.root@ou.edu> wrote:

...

No need for big data sets, just test that it does what you say it does. Tests should be in numpy/lib/tests/test_function_base.py. I'm not quite sure what this does, a bit more explanation in the commit message would help. I'm guessing that datetime differences are now timedelta with inherited units. Chuck

Pierre Haessig

1:44 p.m.

Le 21/01/2012 20:49, Benjamin Root a écrit :

...

Indeed, considering with start with C = NA (complex) 1) When assigning the real part of C to some value, the mask indeed should be clear (if we assume this operation zeroes the imaginary part, which would make sense) 2) When assigning the imaginary part to some value, C is no more masked and there should be indeed no need to clear the real part. I'm assuming here it is easy to access & set separately the real/im part of a complex number. However, I pretty much unaware of complex number memory representation... If this separate access is not easy, then I would question the ability to have a real/im part view on complex data. Pierre

Charles R Harris

4:35 p.m.

On Thu, Jan 26, 2012 at 6:44 AM, Pierre Haessig <pierre.haessig@crans.org>wrote:

...

My feeling is the the real/imag parts should each have their own mask initially copied from the complex array so that those parts could be separately manipulated but the mask on the original would not be affected. I don't think an assignment to, say, the imaginary part should have any effect on the real part and trying to mix the two would be too complicated. In the more general case of views that change the array size Mark thinks we should raise an exception, and I think that is probably the easiest way to go. Since is it possible to make an unmasked copy of an array I don't think that limits what can be done but some uncommon manipulations will be a bit more complicated. Chuck

4795

Age (days ago)

4800

Last active (days ago)

List overview

Download

8 comments

3 participants

participants (3)

Benjamin Root
Charles R Harris
Pierre Haessig

views and mask NA

Charles R Harris

Charles R Harris

Benjamin Root

Charles R Harris

Charles R Harris

Benjamin Root

Charles R Harris

Pierre Haessig

Charles R Harris

tags

participants (3)