<br><br><div class="gmail_quote">On Sat, Oct 29, 2011 at 3:32 AM, Charles R Harris <span dir="ltr"><<a href="mailto:charlesr.harris@gmail.com">charlesr.harris@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<br><br><div class="gmail_quote"><div><div></div><div class="h5">On Fri, Oct 28, 2011 at 6:45 PM, Wes McKinney <span dir="ltr"><<a href="mailto:wesmckinn@gmail.com" target="_blank">wesmckinn@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div><div></div><div>On Fri, Oct 28, 2011 at 7:53 PM, Benjamin Root <<a href="mailto:ben.root@ou.edu" target="_blank">ben.root@ou.edu</a>> wrote:<br>

><br>

><br>

> On Friday, October 28, 2011, Matthew Brett <<a href="mailto:matthew.brett@gmail.com" target="_blank">matthew.brett@gmail.com</a>> wrote:<br>

>> Hi,<br>

>><br>

>> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers<br>

>> <<a href="mailto:ralf.gommers@googlemail.com" target="_blank">ralf.gommers@googlemail.com</a>> wrote:<br>

>>><br>

>>><br>

>>> On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett <<a href="mailto:matthew.brett@gmail.com" target="_blank">matthew.brett@gmail.com</a>><br>

>>> wrote:<br>

>>>><br>

>>>> Hi,<br>

>>>><br>

>>>> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris<br>

>>>> <<a href="mailto:charlesr.harris@gmail.com" target="_blank">charlesr.harris@gmail.com</a>> wrote:<br>

>>>> ><br>

>>>> ><br>

>>>> > On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett<br>

>>>> > <<a href="mailto:matthew.brett@gmail.com" target="_blank">matthew.brett@gmail.com</a>><br>

>>>> > wrote:<br>

>>>> >><br>

>>>> >> Hi,<br>

>>>> >><br>

>>>> >> On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett<br>

>>>> >> <<a href="mailto:matthew.brett@gmail.com" target="_blank">matthew.brett@gmail.com</a>><br>

>>>> >> wrote:<br>

>>>> >> > Hi,<br>

>>>> >> ><br>

>>>> >> > On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris<br>

>>>> >> > <<a href="mailto:charlesr.harris@gmail.com" target="_blank">charlesr.harris@gmail.com</a>> wrote:<br>

>>>> >> >><br>

>>>> >> >><br>

>>>> >> >> On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith <<a href="mailto:njs@pobox.com" target="_blank">njs@pobox.com</a>><br>

>>>> >> >> wrote:<br>

>>>> >> >>><br>

>>>> >> >>> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant<br>

>>>> >> >>> <<a href="mailto:oliphant@enthought.com" target="_blank">oliphant@enthought.com</a>><br>

>>>> >> >>> wrote:<br>

>>>> >> >>> > I think Nathaniel and Matthew provided very<br>

>>>> >> >>> > specific feedback that was helpful in understanding other<br>

>>>> >> >>> > perspectives<br>

>>>> >> >>> > of a<br>

>>>> >> >>> > difficult problem.     In particular, I really wanted<br>

>>>> >> >>> > bit-patterns<br>

>>>> >> >>> > implemented.    However, I also understand that Mark did quite<br>

>>>> >> >>> > a<br>

>>>> >> >>> > bit<br>

>>>> >> >>> > of<br>

>>>> >> >>> > work<br>

>>>> >> >>> > and altered his original designs quite a bit in response to<br>

>>>> >> >>> > community<br>

>>>> >> >>> > feedback.   I wasn't a major part of the pull request<br>

>>>> >> >>> > discussion,<br>

>>>> >> >>> > nor<br>

>>>> >> >>> > did I<br>

>>>> >> >>> > merge the changes, but I support Charles if he reviewed the<br>

>>>> >> >>> > code<br>

>>>> >> >>> > and<br>

>>>> >> >>> > felt<br>

>>>> >> >>> > like it was the right thing to do.  I likely would have done<br>

>>>> >> >>> > the<br>

>>>> >> >>> > same<br>

>>>> >> >>> > thing<br>

>>>> >> >>> > rather than let Mark Wiebe's work languish.<br>

>>>> >> >>><br>

>>>> >> >>> My connectivity is spotty this week, so I'll stay out of the<br>

>>>> >> >>> technical<br>

>>>> >> >>> discussion for now, but I want to share a story.<br>

>>>> >> >>><br>

>>>> >> >>> Maybe a year ago now, Jonathan Taylor and I were debating what<br>

>>>> >> >>> the<br>

>>>> >> >>> best API for describing statistical models would be -- whether we<br>

>>>> >> >>> wanted something like R's "formulas" (which I supported), or<br>

>>>> >> >>> another<br>

>>>> >> >>> approach based on sympy (his idea). To summarize, I thought his<br>

>>>> >> >>> API<br>

>>>> >> >>> was confusing, pointlessly complicated, and didn't actually solve<br>

>>>> >> >>> the<br>

>>>> >> >>> problem; he thought R-style formulas were superficially simpler<br>

>>>> >> >>> but<br>

>>>> >> >>> hopelessly confused and inconsistent underneath. Now, obviously,<br>

>>>> >> >>> I<br>

>>>> >> >>> was<br>

>>>> >> >>> right and he was wrong. Well, obvious to me, anyway... ;-) But it<br>

>>>> >> >>> wasn't like I could just wave a wand and make his arguments go<br>

>>>> >> >>> away,<br>

>>>> >> >>> no I should point out that the implementation hasn't - as far as<br>

>>>> >> >>> I can<br>

>> see - changed the discussion.  The discussion was about the API.<br>

>> Implementations are useful for agreed APIs because they can point out<br>

>> where the API does not make sense or cannot be implemented.  In this<br>

>> case, the API Mark said he was going to implement - he did implement -<br>

>> at least as far as I can see.  Again, I'm happy to be corrected.<br>

>><br>

>>>> In saying that we are insisting on our way, you are saying, implicitly,<br>

>>>> 'I<br>

>>>> am not going to negotiate'.<br>

>>><br>

>>> That is only your interpretation. The observation that Mark compromised<br>

>>> quite a bit while you didn't seems largely correct to me.<br>

>><br>

>> The problem here stems from our inability to work towards agreement,<br>

>> rather than standing on set positions.  I set out what changes I think<br>

>> would make the current implementation OK.  Can we please, please have<br>

>> a discussion about those points instead of trying to argue about who<br>

>> has given more ground.<br>

>><br>

>>> That commitment would of course be good. However, even if that were<br>

>>> possible<br>

>>> before writing code and everyone agreed that the ideas of you and<br>

>>> Nathaniel<br>

>>> should be implemented in full, it's still not clear that either of you<br>

>>> would<br>

>>> be willing to write any code. Agreement without code still doesn't help<br>

>>> us<br>

>>> very much.<br>

>><br>

>> I'm going to return to Nathaniel's point - it is a highly valuable<br>

>> thing to set ourselves the target of resolving substantial discussions<br>

>> by consensus.   The route you are endorsing here is 'implementor<br>

>> wins'.   We don't need to do it that way.  We're a mature sensible<br>

>> bunch of adults who can talk out the issues until we agree they are<br>

>> ready for implementation, and then implement.  That's all Nathaniel is<br>

>> saying.  I think he's obviously right, and I'm sad that it isn't as<br>

>> clear to y'all as it is to me.<br>

>><br>

>> Best,<br>

>><br>

>> Matthew<br>

>><br>

><br>

> Everyone, can we please not do this?! I had enough of adults doing finger<br>

> pointing back over the summer during the whole debt ceiling debate.  I think<br>

> we can all agree that we are better than the US congress?<br>

><br>

> Forget about rudeness or decision processes.<br>

><br>

> I will start by saying that I am willing to separate ignore and absent, but<br>

> only on the write side of things.  On read, I want a single way to identify<br>

> the missing values.  I also want only a single way to perform calculations<br>

> (either skip or propagate).<br>

><br>

> An indicator of success would be that people stop using NaNs and magic<br>

> numbers (-9999, anyone?) and we could even deprecate nansum(), or at least<br>

> strongly suggest in its docs to use NA.<br>

<br>

</div></div>Well, I haven't completely made up my mind yet, will have to do some<br>

more prototyping and playing (and potentially have some of my users<br>

eat the differently-flavored dogfood), but I'm really not very<br>

satisfied with the API at the moment. I'm mainly worried about the<br>

abstraction leaking through to pandas users (this is a pretty large<br>

group of people judging by # of downloads).<br>

<br>

The basic position I'm in is that I'm trying to push Python into a new<br>

space, namely mainstream data analysis and statistical computing, one<br>

that is solidly occupied by R and other such well-known players. My<br>

target users are not computer scientists. They are not going to invest<br>

in understanding dtypes very deeply or the internals of ndarray. In<br>

fact I've spent a great deal of effort making it so that pandas users<br>

can be productive and successful while having very little<br>

understanding of NumPy. Yes, I essentially "protect" my users from<br>

NumPy because using it well requires a certain level of sophistication<br>

that I think is unfair to demand of people. This might seem totally<br>

bizarre to some of you but it is simply the state of affairs. So far I<br>

have been successful because more people are using Python and pandas<br>

to do things that they used to do in R. The NA concept in R is dead<br>

simple and I don't see why we are incapable of also implementing<br>

something that is just as dead simple. To we, the scipy elite let's<br>

call us, it seems simple: "oh, just pass an extra flag to all my array<br>

constructors!" But this along with the masked array concept is going<br>

to have two likely outcomes:<br>

<br>

1) Create a great deal more complication in my already very large codebase<br>

<br>

and/or<br>

<br>

2) force pandas users to understand the new masked arrays after I've<br>

carefully made it so they can be largely ignorant of NumPy<br>

<br>

The mostly-NaN-based solution I've cobbled together and tweaked over<br>

the last 42 months actually *works really well*, amazingly, with<br>

relatively little cost in code complexity. Having found a reasonably<br>

stable equilibrium I'm extremely resistant to upset the balance.<br>

<br>

So I don't know. After watching these threads bounce back and forth<br>

I'm frankly not all that hopeful about a solution arising that<br>

actually addresses my needs.<br></blockquote></div></div><div><br>But Wes, what *are* your needs? You keep saying this, but we need examples of how you want to operate and how numpy fails. As to dtypes, internals, and all that, I don't see any of that in the current implementation, unless you mean the maskna and skipna keywords. I believe someone on the previous thread mentioned a way to deal with that.<br>

</div></div></blockquote><div><br>From the release notes I just learned that skipna is basically the same as in R:<br>"R's parameter <a href="http://rm.na">rm.na</a>=T is spelled skipna=True in NumPy."<br><br>

It provides a good summary of the current status in master:<br><a href="https://github.com/numpy/numpy/blob/master/doc/release/2.0.0-notes.rst">https://github.com/numpy/numpy/blob/master/doc/release/2.0.0-notes.rst</a><br>

<br>Ralf<br><br></div></div><br>