[Numpy-discussion] Missing Values Discussion

Bruce Southey bsouthey at gmail.com
Fri Jul 8 09:22:24 EDT 2011


On 07/08/2011 07:15 AM, Matthew Brett wrote:
> Hi Travis,
>
> On Fri, Jul 8, 2011 at 5:03 AM, Travis Oliphant<oliphant at enthought.com>  wrote:
>> Hi all,
>>
>> I want to first apologize for stepping into this discussion a bit late and for not being able to participate adequately.   However, I want to offer a couple of perspectives, and my opinion about what we should do as well as clarify what I have instructed Mark to do as part of his summer work.
>>
>> First, the discussion has reminded me how valuable it is to get feedback from all points of view.  While it does lengthen the process, it significantly enhances the result.  I strongly hope we can continue the tradition of respectful discussion on this mailing list where people's views are treated with respect --- even if we don't always have the time to understand them in depth.
>>
>> I also really appreciate people taking the time to visit on the phone call with me as it gave me a chance to understand many opinions quickly and at least start to form a possibly useful opinion.
>>
>> Basically, because there is not consensus and in fact a strong and reasonable opposition to specific points, Mark's NEP as proposed cannot be accepted in its entirety right now.   However,  I believe an implementation of his NEP is useful and will be instructive in resolving the issues and so I have instructed him to spend Enthought time on the implementation.   Any changes that need to be made to the API before it is accepted into a released form of NumPy can still be made even after most of the implementation is completed as far as I understand it.   This is because most of the disagreement is about the specific ability to manipulate the masks independently of assigning missing data and the creation of an additional np.HIDE (np.IGNORE) concept at the Python level.
>>
>> Despite some powerful arguments on both sides of the discussion, I am confident that we can figure out an elegant solution that will work long term.
>>
>> My current opinion is that I am very favorable to making easy the use-case that has been repeatedly described of having "missing data" that is *always* missing and then having "hidden data" that you don't want to think about for a particular set of calculations (but you also don't want to through away by over-writing).   I think it is important to make it easy to keep that data around without over-writing but also have the "idea" of that kind of missing data different than the idea of data you can't care about because it just isn't there.
>>
>> I also think it is important for the calculation infrastructure to have just one notion of "missing data" which Mark's NEP handles beautifully.
>>
>> It seems to me that some of the disagreement is one of perspective in that Mark articulates very well the position of "generic programming, make-opaque-the-implementation" perspective with a focus on the implications of missing data for calculations.    Nathaniel and Matthew articulate well the perspective of "focusing" on the data object itself and the desire to keep separate the different ideas behind missing data that have been described --- as well as a powerfully described description of the NumPy tradition of exposing the raw data to the Python side without hiding too much of the implementation from the user.
>>
>> I think it's a healthy discussion.   But, I would like to see Mark's code get completed so that we can start talking about code examples.   Please don't interpret my instructing Mark to finish the code as "it's been decided".  I simply think it's the best path forward to ultimately resolving the concerns.   I would like to see an API worked out before summer's end --- and I'm hopeful everyone will be excited about what the resulting design is.
>>
>> I do think there is room for agreement in the present debate if we all remember to keep listening to each other.  It takes a lot of effort to understand somebody else's point of view.  I have been grateful to see evidence I see of that behavior multiple times (in Mark's revamping of the NEP, in Matthew Brett's re-statement of his interpretation of Mark's views, in Nathaniel's working hard to engage the dialogue even in the throes of finishing his PhD, and many other examples).
>>
>> It makes me very happy to be a part of this community.  I look forward to times when I can send more thoughtful and technical emails than this one.
> Thanks for this email - it is very helpful.
>
> Personally I was worrying that:
>
> A) Mark had not fully grasped our concern
> B) Disagreement was not welcome
>
> and this gave me an uncomfortable feeling about A) the resulting API
> and B) the discussion.  You've dealt with both here, and thank you for
> that.
>
> Can I ask - what do you recommend that we do now, for the discussion?
> Should we be quiet and wait until there is code to test, or, as
> Nathaniel has tried to do, work at reaching some compromise that makes
> sense to some or all parties?
>
> Thanks again,
>
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
I agree that this has been very interesting discussion especially the 
great interaction between everyone.

The one thing that we do need now is the code that implements the small 
set of core ideas (array creation and simple numerical operations). 
Hopefully that will provide a better grasp of the concepts and the 
performance differences to determine the acceptability of the approach(es).

Bruce



More information about the NumPy-Discussion mailing list