[Numpy-discussion] NA masks in the next numpy release?

Fri Oct 28 17:01:52 EDT 2011

Hi,

On Fri, Oct 28, 2011 at 1:52 PM, Benjamin Root <ben.root at ou.edu> wrote:
>
>
> On Fri, Oct 28, 2011 at 3:22 PM, Matthew Brett <matthew.brett at gmail.com>
> wrote:
>>
>> Hi,
>>
>> On Fri, Oct 28, 2011 at 1:14 PM, Benjamin Root <ben.root at ou.edu> wrote:
>> >
>> >
>> > On Fri, Oct 28, 2011 at 3:02 PM, Matthew Brett <matthew.brett at gmail.com>
>> > wrote:
>> >>
>> >> You and I know that I've got an array with values [99, 100, 3] and a
>> >> mask with values [False, False, True].  So maybe I'd like to see what
>> >> happens if I take off the mask from the second value.   I know that's
>> >> what I want to do, but I don't know how to do it, because you won't
>> >> let me manipulate the mask, because I'm not allowed to know that the
>> >> NA values come from the mask.
>> >>
>> >> The alterNEP is just saying - please - be straight with me.   If
>> >> you're doing masking, show me the mask, and don't try and hide that
>> >> there are stored values underneath.
>> >>
>> >
>> > Considering that you have admitted before to not regularly using masked
>> > arrays, I seriously doubt that you would be able to judge whether this
>> > is a
>> > significant detriment or not.  My entire point that I have been making
>> > is
>> > that Mark's implementation is not the same as the current masked arrays.
>> > Instead, it is a cleaner, more mature implementation that gets rid of
>> > extraneous "features".
>>
>> This may explain why we don't seem to be getting anywhere.  I am sure
>> that Mark's implementation of masking is great.   We're not talking
>> about that.  We're talking about whether it's a good idea to make
>> masking look as though it is implementing the ABSENT idea.   That's
>> what I think is confusing, and that's the conversation I have been
>> trying to pursue.
>>
>> Best,
>>
>> Matthew
>
> Sorry if I came across too strongly there. No disrespect was intended.

I wasn't worried about the disrespect.  It's just I feel the
discussion has not been to the point.

> Personally, I think we are getting somewhere.  We have been whittling away
> what it is that we do agree upon, and have begun to specify *exactly* what
> it is that we disagree on.  I have understand your concern, and -- like I
> said in my previous email -- it makes sense from the perspective of numpy.ma
> users have had up to now.

But I'm not a numpy.ma user, I'm just someone who knows that what you
are doing is masking out values.  The fact that I do not use numpy.ma
points out that it's possible to find this highly counter-intuitive
without prior bias.

> But, I re-raise my point that I have been making
> about the need to re-think masked arrays.  If we consider masks as advanced
> slicing or boolean indexing, then being unable to access the underlying
> values actually makes a lot of sense.
>
> Consider it a contract when I pass a set of data with only certain values
> exposed.  Because I passed the data with only those values exposed, then it
> must have been entirely my intention to let the function know of only those
> values.  It would be a violation of that contract if the function obtained
> those masked values.  If I want to communicate both the original values and
> a particular mask, then I pass the array and a view with a particular mask.

This is the old discussion about what Python users expect.  I think
they expect to be treated as adults.  That is, breaking the contract
should not be easy to do by accident, but it should be allowed.

> Maybe it would be helpful that an array can never have its own mask, but
> rather, only views can carry masks?
>
> In conclusion, I submit that this is largely a problem that can be solved
> with the proper documentation.  New users who never used numpy.ma before do
> not have to concern themselves with the old way of thinking and are just
> simply taught what masked arrays "are".  Meanwhile, a special section of the
> documentation should be made that teaches numpy.ma users how masked arrays
> "should be".

I don't think documentation will solve it.  In a way, the ideal user
is someone who doesn't know what's going on, because, for a while,
they may not realize that when they thought they were doing
assignment, in fact they are doing masking.  Unfortunately, I suspect
almost everyone using these things will start to realize that, and
then they will start getting confused.  I find it confusing, and I
believe myself to understand the issues pretty well, and be of
numpy-user-range comprehension powers.

See you,

Matthew