![](https://secure.gravatar.com/avatar/72902e7adf1c8f5b524c04a15cc3c6a5.jpg?s=120&d=mm&r=g)
On Fri, Jul 1, 2011 at 9:50 AM, Matthew Brett <matthew.brett@gmail.com>wrote:
Hi,
On Fri, Jul 1, 2011 at 3:09 PM, Mark Wiebe <mwwiebe@gmail.com> wrote:
On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Jul 1, 2011 at 2:36 AM, Keith Goodman <kwgoodman@gmail.com>
wrote:
On Thu, Jun 30, 2011 at 10:51 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Thu, Jun 30, 2011 at 6:31 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
In the interest of making the discussion as concrete as possible, here is my draft of an alternative proposal for NAs and masking, based on Nathaniel's comments. Writing it, it seemed to me that Nathaniel is right, that the ideas become much clearer when the NA idea and the MASK idea are separate. Please do pitch in for things I may have missed or misunderstood: [...]
Thanks for writing this up! I stuck it up as a gist so we can edit it more easily: https://gist.github.com/1056379/ This is your initial version:
https://gist.github.com/1056379/c809715f4e9765db72908c605468304ea1eb2191
And I made a few changes:
https://gist.github.com/1056379/33ba20300e1b72156c8fb655bd1ceef03f8a6583
Specifically, I added a rationale section, changed np.MASKED to np.IGNORE (as per comments in this thread), and added a vowel to "propmsk".
It might be helpful to make a small toy class in python so that people can play around with NA and IGNORE from the alterNEP.
Thanks for doing this.
I don't know about you, but I don't know where to work on the discussion or draft implementation, because I am not sure where the disagreement is. Lluis has helpfully pointed out a specific case of interest. Pierre has fed back with some points of clarification. However, other than that, I'm not sure what we should be discussing.
@Mark @Chuck @anyone
Do you see problems with the alterNEP proposal?
Yes, I really like my design as it stands now, and the alterNEP removes a lot of the abstraction and interoperability that are in my opinion the best parts. I've made more updates to the NEP based on continuing feedback, which are part of the pull request I want reviews for.
Ah - I think what you are saying is - too late I've started writing it.
Do you want me to spend my whole summer designing something before starting the implementation? I made a pull request implementing a non-controversial part of the NEP to get started, and I've not seen any feedback on except from Chuck and Derek. (Many thanks to Chuck and Derek!) Implementation and design are tied together in a feedback loop, and separate designs that aren't informed by the implementation details, for example information gained by going through the proposed code changes and reviewing them, are counterproductive. I appreciate the effort you're putting in, and I've been trying to guide you towards a more holistic path of contribution by pointing out the pull request.
Mainly: Reduced interoperability
Meaning?
You can't switch between the two approaches without big changes in your code.
more complex implementation (leading to more bugs),
OK - but the discussion did not seem to be about the complexity of the implementation, but about the API.
The implementation always plays a role in the design of anything. Making an API design abstractly, then testing it against implementation constraints is good, making an API completely divorced from considerations of implementation is really really bad.
and an unclear theoretical model for the masked part of i
What's unclear? Or even different?
After thinking about the missing data model some more, I've come up with more rationale for why the R approach is good, and adopting both the R default and skipna option is appropriate. It's in the pull request up for code review.
Do you agree that the alterNEP proposal is easier to understand?
No.
Do you agree that there are several people on the list who do thing that the alterNEP proposal is easier to understand?
Feedback on the clarity of my writing in the NEP is welcome, if something is unclear to someone, please point out the specific part so I can continue to improve it. I don't think the clarity of the writing is a good reason for choosing one design or another, the quality of the design is what should decide that.
If not, can you explain why?
My answers to that are already scattered in the emails in various places, and in the various rationales and justifications provided in the NEP.
I can't see any reference to the alterNEP or the idea of the separate API in the NEP. Can you point me to it?
I'm referring to positive arguments for why the design decisions are as they are. I don't see the alterNEP referencing specific things that are wrong with the NEP either, it just assumes sharing the API is a bad idea without making clearly stated arguments for or against it.
What do you see as the important points of difference between the NEP
and the alterNEP?
The biggest thing is the NEP supports more use cases in a clean way by composition of different simpler components. It defines one clear missing data abstraction, and proposes two implementations that are interchangeable and can interoperate. The alterNEP proposes two independent APIs, reducing interoperability and so significantly increasing the amount of learning required to work with both of them. This also precludes switching between the two approaches without a lot of work.
Lluis gave a particular somewhat obscure case where it is convenient that the NA and IGNORE are the same. Are there any others? It seems to me the API you propose is a classic example of implicit rather than explicit, and that it would be very easy, at this stage, to fix that.
And I came up with a nice way to deal with this situation through a subclass of ndarray changing the default 'skipna=' parameter value. The "implicit vs explicit" quote is overused, but even so I've applied the idea very carefully. In the NEP, you never get missing value support unless you explicitly request it.
The current pull request that's sitting there waiting for review does not
have an impact on which approach goes ahead, but the code I'm doing now does. This is a fairly large project, and I don't have a great length of time to do it in, so I'm not going to participate extensively in the alterNEP discussion. If you want to help me, please review my code and provide specific feedback on my NEP (the code review system in github is great for this too, I've received some excellent feedback on the NEP that way). If you want to change my mind about things, please address the specific design decisions you think are problematic by specifically responding to lines in the NEP, as part of code-reviewing my pull request in github.
OK - unless you tell me differently I'l take that as 'the discussion of the separate API for NA and IGNORE is over as far as I am concerned'.
Yes, because I'm not seeing arguments responding with specific examples or use cases showing why a separate API is better, in particular which deal with the arguments I've given indicating why sharing the API is useful. I would say, for future reference, that if there is a substantial and
reasonable discussion of the API, that is not well resolved, then it does harm to go ahead and implement regardless. Specifically, it demoralizes those of us who put energy into trying to have a substantial reasoned discussion. I think that's bad for the list and bad for the community.
You might have consideration for morale of those who are putting substantial effort into designing and implementing it as well. The ecosystem is not just this mailing list, it also is the code and documentation review process on github, and when people who only participate on the mailing list are tearing apart carefully constructed designs based in part on some mischaracterizations of those designs, then expecting to be corrected each time instead of studying the proposed design to understand and compare it to their competing ideas, it's harder and harder to keep responding with corrections. I appreciate your feedback, the design for the NA bit pattern approach that is in the NEP is inspired by your feedback for wanting that style of NA functionality. Thanks, Mark
See you,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion