consensus (was: NA masks in the next numpy release?)
On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant <oliphant@enthought.com> wrote:
I think Nathaniel and Matthew provided very specific feedback that was helpful in understanding other perspectives of a difficult problem. In particular, I really wanted bit-patterns implemented. However, I also understand that Mark did quite a bit of work and altered his original designs quite a bit in response to community feedback. I wasn't a major part of the pull request discussion, nor did I merge the changes, but I support Charles if he reviewed the code and felt like it was the right thing to do. I likely would have done the same thing rather than let Mark Wiebe's work languish.
My connectivity is spotty this week, so I'll stay out of the technical discussion for now, but I want to share a story. Maybe a year ago now, Jonathan Taylor and I were debating what the best API for describing statistical models would be -- whether we wanted something like R's "formulas" (which I supported), or another approach based on sympy (his idea). To summarize, I thought his API was confusing, pointlessly complicated, and didn't actually solve the problem; he thought R-style formulas were superficially simpler but hopelessly confused and inconsistent underneath. Now, obviously, I was right and he was wrong. Well, obvious to me, anyway... ;-) But it wasn't like I could just wave a wand and make his arguments go away, no matter how annoying and wrong-headed I thought they were... I could write all the code I wanted but no-one would use it unless I could convince them it's actually the right solution, so I had to engage with him, and dig deep into his arguments. What I discovered was that (as I thought) R-style formulas *do* have a solid theoretical basis -- but (as he thought) all the existing implementations *are* broken and inconsistent! I'm still not sure I can actually convince Jonathan to go my way, but, because of his stubbornness, I had to invent a better way of handling these formulas, and so my library[1] is actually the first implementation of these things that has a rigorous theory behind it, and in the process it avoids two fundamental, decades-old bugs in R. (And I'm not sure the R folks can fix either of them at this point without breaking a ton of code, since they both have API consequences.) -- It's extremely common for healthy FOSS projects to insist on consensus for almost all decisions, where consensus means something like "every interested party has a veto"[2]. This seems counterintuitive, because if everyone's vetoing all the time, how does anything get done? The trick is that if anyone *can* veto, then vetoes turn out to actually be very rare. Everyone knows that they can't just ignore alternative points of view -- they have to engage with them if they want to get anything done. So you get buy-in on features early, and no vetoes are necessary. And by forcing people to engage with each other, like me with Jonathan, you get better designs. But what about the cost of all that code that doesn't get merged, or written, because everyone's spending all this time debating instead? Better designs are nice and all, but how does that justify letting working code languish? The greatest risk for a FOSS project is that people will ignore you. Projects and features live and die by community buy-in. Consider the "NA mask" feature right now. It works (at least the parts of it that are implemented). It's in mainline. But IIRC, Pierre said last time that he doesn't think the current design will help him improve or replace numpy.ma. Up-thread, Wes McKinney is leaning towards ignoring this feature in favor of his library pandas' current hacky NA support. Members of the neuroimaging crowd are saying that the memory overhead is too high and the benefits too marginal, so they'll stick with NaNs. Together these folk a huge proportion of the this feature's target audience. So what have we actually accomplished by merging this to mainline? Are we going to be stuck supporting a feature that only a fraction of the target audience actually uses? (Maybe they're being dumb, but if people are ignoring your code for dumb reasons... they're still ignoring your code.) The consensus rule forces everyone to do the hardest and riskiest part -- building buy-in -- up front. Because you *have* to do it sooner or later, and doing it sooner doesn't just generate better designs. It drastically reduces the risk of ending up in a huge trainwreck. -- In my story at the beginning, I wished I had a magic wand to skip this annoying debate and political stuff. But giving it to me would have been a bad idea. I think that's went wrong with the NA discussion in the first place. Mark's an excellent programmer, and he tried his best to act in the good of everyone in the project -- but in the end, he did have a wand like that. He didn't have that sense that he *had* to get everyone on board (even the people who were saying dumb things), or he'd just be wasting his time. He didn't ask Pierre if the NA design would actually work for numpy.ma's purposes -- I did. You may have noticed that I do have some ideas for about how NA support should work. But my ideas aren't really the important thing. The alter-NEP was my attempt to find common ground between the different needs people were bringing up, so we could discuss whether it would work for people or not. I'm not wedded to anything in it. But this is a complicated issue with a lot of conflicting interests, and we need to find something that actually does work for everyone (or as large a subset as is practical). So here's what I think we should do: 1) I will submit a pull request backing Mark's NA work out of mainline, for now. (This is more or less done, I just need to get it onto github, see above re: connectivity) 2) I will also put together a new branch containing that work, rebased against current mainline, so it doesn't get lost. (Ditto.) 3) And we'll decide what to do with it *after* we hammer out a design that the various NA-supporting groups all find convincing. Or at least a design for some of the less controversial pieces (like the 'where=' ufunc argument?), get those merged, and then iterate incrementally. What do you all think? And in any case, thanks for reading, -- Nathaniel [1] https://github.com/charlton/charlton [2] For example, this is written into the Apache voting procedure: https://www.apache.org/foundation/voting.html (it's the "code modification" rules that are relevant). And as usual, Karl Fogel has more useful discussion: http://producingoss.com/en/consensus-democracy.html (see esp. the "When to vote" section, which is entirely about how to avoid voting)
Hi, On Fri, Oct 28, 2011 at 2:16 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant <oliphant@enthought.com> wrote:
I think Nathaniel and Matthew provided very specific feedback that was helpful in understanding other perspectives of a difficult problem. In particular, I really wanted bit-patterns implemented. However, I also understand that Mark did quite a bit of work and altered his original designs quite a bit in response to community feedback. I wasn't a major part of the pull request discussion, nor did I merge the changes, but I support Charles if he reviewed the code and felt like it was the right thing to do. I likely would have done the same thing rather than let Mark Wiebe's work languish.
My connectivity is spotty this week, so I'll stay out of the technical discussion for now, but I want to share a story.
Maybe a year ago now, Jonathan Taylor and I were debating what the best API for describing statistical models would be -- whether we wanted something like R's "formulas" (which I supported), or another approach based on sympy (his idea). To summarize, I thought his API was confusing, pointlessly complicated, and didn't actually solve the problem; he thought R-style formulas were superficially simpler but hopelessly confused and inconsistent underneath. Now, obviously, I was right and he was wrong. Well, obvious to me, anyway... ;-) But it wasn't like I could just wave a wand and make his arguments go away, no matter how annoying and wrong-headed I thought they were... I could write all the code I wanted but no-one would use it unless I could convince them it's actually the right solution, so I had to engage with him, and dig deep into his arguments.
What I discovered was that (as I thought) R-style formulas *do* have a solid theoretical basis -- but (as he thought) all the existing implementations *are* broken and inconsistent! I'm still not sure I can actually convince Jonathan to go my way, but, because of his stubbornness, I had to invent a better way of handling these formulas, and so my library[1] is actually the first implementation of these things that has a rigorous theory behind it, and in the process it avoids two fundamental, decades-old bugs in R. (And I'm not sure the R folks can fix either of them at this point without breaking a ton of code, since they both have API consequences.)
--
It's extremely common for healthy FOSS projects to insist on consensus for almost all decisions, where consensus means something like "every interested party has a veto"[2]. This seems counterintuitive, because if everyone's vetoing all the time, how does anything get done? The trick is that if anyone *can* veto, then vetoes turn out to actually be very rare. Everyone knows that they can't just ignore alternative points of view -- they have to engage with them if they want to get anything done. So you get buy-in on features early, and no vetoes are necessary. And by forcing people to engage with each other, like me with Jonathan, you get better designs.
But what about the cost of all that code that doesn't get merged, or written, because everyone's spending all this time debating instead? Better designs are nice and all, but how does that justify letting working code languish?
The greatest risk for a FOSS project is that people will ignore you. Projects and features live and die by community buy-in. Consider the "NA mask" feature right now. It works (at least the parts of it that are implemented). It's in mainline. But IIRC, Pierre said last time that he doesn't think the current design will help him improve or replace numpy.ma. Up-thread, Wes McKinney is leaning towards ignoring this feature in favor of his library pandas' current hacky NA support. Members of the neuroimaging crowd are saying that the memory overhead is too high and the benefits too marginal, so they'll stick with NaNs. Together these folk a huge proportion of the this feature's target audience. So what have we actually accomplished by merging this to mainline? Are we going to be stuck supporting a feature that only a fraction of the target audience actually uses? (Maybe they're being dumb, but if people are ignoring your code for dumb reasons... they're still ignoring your code.)
The consensus rule forces everyone to do the hardest and riskiest part -- building buy-in -- up front. Because you *have* to do it sooner or later, and doing it sooner doesn't just generate better designs. It drastically reduces the risk of ending up in a huge trainwreck.
--
In my story at the beginning, I wished I had a magic wand to skip this annoying debate and political stuff. But giving it to me would have been a bad idea. I think that's went wrong with the NA discussion in the first place. Mark's an excellent programmer, and he tried his best to act in the good of everyone in the project -- but in the end, he did have a wand like that. He didn't have that sense that he *had* to get everyone on board (even the people who were saying dumb things), or he'd just be wasting his time. He didn't ask Pierre if the NA design would actually work for numpy.ma's purposes -- I did.
You may have noticed that I do have some ideas for about how NA support should work. But my ideas aren't really the important thing. The alter-NEP was my attempt to find common ground between the different needs people were bringing up, so we could discuss whether it would work for people or not. I'm not wedded to anything in it. But this is a complicated issue with a lot of conflicting interests, and we need to find something that actually does work for everyone (or as large a subset as is practical).
So here's what I think we should do: 1) I will submit a pull request backing Mark's NA work out of mainline, for now. (This is more or less done, I just need to get it onto github, see above re: connectivity) 2) I will also put together a new branch containing that work, rebased against current mainline, so it doesn't get lost. (Ditto.) 3) And we'll decide what to do with it *after* we hammer out a design that the various NA-supporting groups all find convincing. Or at least a design for some of the less controversial pieces (like the 'where=' ufunc argument?), get those merged, and then iterate incrementally.
What do you all think?
Nice post - thank you. I agree that we may have a problem with - process. I mean, maybe there is not much agreement on what the process for these kinds of discussions should be - and therefore - we can't point to some constitution or similar to say - hey - wait - we're not doing it right. It seems to me - from my technical reply to Travis - that it would be reasonable to keep Mark's implementation of masked arrays, but with some minor modifications to keep IGNORED (implemented) separable conceptually from ABSENT (not implemented). Maybe the discussion could be about those modifications? Specifically, where do you feel the points of disagreement are, after the masking idea has become clearly an implementation of IGNORED? I guess you also don't much care if the IGNORED default behavior is PROPAGATE or SKIP. I had thought about what would happen to numpy.ma - and I would really like to know what Pierre would need for this implementation to allow him to replace numpy.ma. See you, Matthew
On Fri, Oct 28, 2011 at 2:32 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 2:16 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant <oliphant@enthought.com> wrote:
I think Nathaniel and Matthew provided very specific feedback that was helpful in understanding other perspectives of a difficult problem. In particular, I really wanted bit-patterns implemented. However, I also understand that Mark did quite a bit of work and altered his original designs quite a bit in response to community feedback. I wasn't a major part of the pull request discussion, nor did I merge the changes, but I support Charles if he reviewed the code and felt like it was the right thing to do. I likely would have done the same thing rather than let Mark Wiebe's work languish.
My connectivity is spotty this week, so I'll stay out of the technical discussion for now, but I want to share a story.
Maybe a year ago now, Jonathan Taylor and I were debating what the best API for describing statistical models would be -- whether we wanted something like R's "formulas" (which I supported), or another approach based on sympy (his idea). To summarize, I thought his API was confusing, pointlessly complicated, and didn't actually solve the problem; he thought R-style formulas were superficially simpler but hopelessly confused and inconsistent underneath. Now, obviously, I was right and he was wrong. Well, obvious to me, anyway... ;-) But it wasn't like I could just wave a wand and make his arguments go away, no matter how annoying and wrong-headed I thought they were... I could write all the code I wanted but no-one would use it unless I could convince them it's actually the right solution, so I had to engage with him, and dig deep into his arguments.
What I discovered was that (as I thought) R-style formulas *do* have a solid theoretical basis -- but (as he thought) all the existing implementations *are* broken and inconsistent! I'm still not sure I can actually convince Jonathan to go my way, but, because of his stubbornness, I had to invent a better way of handling these formulas, and so my library[1] is actually the first implementation of these things that has a rigorous theory behind it, and in the process it avoids two fundamental, decades-old bugs in R. (And I'm not sure the R folks can fix either of them at this point without breaking a ton of code, since they both have API consequences.)
--
It's extremely common for healthy FOSS projects to insist on consensus for almost all decisions, where consensus means something like "every interested party has a veto"[2]. This seems counterintuitive, because if everyone's vetoing all the time, how does anything get done? The trick is that if anyone *can* veto, then vetoes turn out to actually be very rare. Everyone knows that they can't just ignore alternative points of view -- they have to engage with them if they want to get anything done. So you get buy-in on features early, and no vetoes are necessary. And by forcing people to engage with each other, like me with Jonathan, you get better designs.
But what about the cost of all that code that doesn't get merged, or written, because everyone's spending all this time debating instead? Better designs are nice and all, but how does that justify letting working code languish?
The greatest risk for a FOSS project is that people will ignore you. Projects and features live and die by community buy-in. Consider the "NA mask" feature right now. It works (at least the parts of it that are implemented). It's in mainline. But IIRC, Pierre said last time that he doesn't think the current design will help him improve or replace numpy.ma. Up-thread, Wes McKinney is leaning towards ignoring this feature in favor of his library pandas' current hacky NA support. Members of the neuroimaging crowd are saying that the memory overhead is too high and the benefits too marginal, so they'll stick with NaNs. Together these folk a huge proportion of the this feature's target audience. So what have we actually accomplished by merging this to mainline? Are we going to be stuck supporting a feature that only a fraction of the target audience actually uses? (Maybe they're being dumb, but if people are ignoring your code for dumb reasons... they're still ignoring your code.)
The consensus rule forces everyone to do the hardest and riskiest part -- building buy-in -- up front. Because you *have* to do it sooner or later, and doing it sooner doesn't just generate better designs. It drastically reduces the risk of ending up in a huge trainwreck.
--
In my story at the beginning, I wished I had a magic wand to skip this annoying debate and political stuff. But giving it to me would have been a bad idea. I think that's went wrong with the NA discussion in the first place. Mark's an excellent programmer, and he tried his best to act in the good of everyone in the project -- but in the end, he did have a wand like that. He didn't have that sense that he *had* to get everyone on board (even the people who were saying dumb things), or he'd just be wasting his time. He didn't ask Pierre if the NA design would actually work for numpy.ma's purposes -- I did.
You may have noticed that I do have some ideas for about how NA support should work. But my ideas aren't really the important thing. The alter-NEP was my attempt to find common ground between the different needs people were bringing up, so we could discuss whether it would work for people or not. I'm not wedded to anything in it. But this is a complicated issue with a lot of conflicting interests, and we need to find something that actually does work for everyone (or as large a subset as is practical).
So here's what I think we should do: 1) I will submit a pull request backing Mark's NA work out of mainline, for now. (This is more or less done, I just need to get it onto github, see above re: connectivity) 2) I will also put together a new branch containing that work, rebased against current mainline, so it doesn't get lost. (Ditto.) 3) And we'll decide what to do with it *after* we hammer out a design that the various NA-supporting groups all find convincing. Or at least a design for some of the less controversial pieces (like the 'where=' ufunc argument?), get those merged, and then iterate incrementally.
What do you all think?
Nice post - thank you.
I agree that we may have a problem with - process. I mean, maybe there is not much agreement on what the process for these kinds of discussions should be - and therefore - we can't point to some constitution or similar to say - hey - wait - we're not doing it right.
Your post reminded me of this: http://en.wikipedia.org/wiki/Rough_consensus It does depend on having something like a committee and a chairperson though. See you, Matthew
On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith <njs@pobox.com> wrote:
I think Nathaniel and Matthew provided very specific feedback that was helpful in understanding other perspectives of a difficult problem. In particular, I really wanted bit-patterns implemented. However, I also understand that Mark did quite a bit of work and altered his original designs quite a bit in response to community feedback. I wasn't a major part of the pull request discussion, nor did I merge the changes, but I support Charles if he reviewed the code and felt like it was the right thing to do. I likely would have done the same
On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant <oliphant@enthought.com> wrote: thing
rather than let Mark Wiebe's work languish.
My connectivity is spotty this week, so I'll stay out of the technical discussion for now, but I want to share a story.
Maybe a year ago now, Jonathan Taylor and I were debating what the best API for describing statistical models would be -- whether we wanted something like R's "formulas" (which I supported), or another approach based on sympy (his idea). To summarize, I thought his API was confusing, pointlessly complicated, and didn't actually solve the problem; he thought R-style formulas were superficially simpler but hopelessly confused and inconsistent underneath. Now, obviously, I was right and he was wrong. Well, obvious to me, anyway... ;-) But it wasn't like I could just wave a wand and make his arguments go away, no matter how annoying and wrong-headed I thought they were... I could write all the code I wanted but no-one would use it unless I could convince them it's actually the right solution, so I had to engage with him, and dig deep into his arguments.
What I discovered was that (as I thought) R-style formulas *do* have a solid theoretical basis -- but (as he thought) all the existing implementations *are* broken and inconsistent! I'm still not sure I can actually convince Jonathan to go my way, but, because of his stubbornness, I had to invent a better way of handling these formulas, and so my library[1] is actually the first implementation of these things that has a rigorous theory behind it, and in the process it avoids two fundamental, decades-old bugs in R. (And I'm not sure the R folks can fix either of them at this point without breaking a ton of code, since they both have API consequences.)
--
It's extremely common for healthy FOSS projects to insist on consensus for almost all decisions, where consensus means something like "every interested party has a veto"[2]. This seems counterintuitive, because if everyone's vetoing all the time, how does anything get done? The trick is that if anyone *can* veto, then vetoes turn out to actually be very rare. Everyone knows that they can't just ignore alternative points of view -- they have to engage with them if they want to get anything done. So you get buy-in on features early, and no vetoes are necessary. And by forcing people to engage with each other, like me with Jonathan, you get better designs.
But what about the cost of all that code that doesn't get merged, or written, because everyone's spending all this time debating instead? Better designs are nice and all, but how does that justify letting working code languish?
The greatest risk for a FOSS project is that people will ignore you. Projects and features live and die by community buy-in. Consider the "NA mask" feature right now. It works (at least the parts of it that are implemented). It's in mainline. But IIRC, Pierre said last time that he doesn't think the current design will help him improve or replace numpy.ma. Up-thread, Wes McKinney is leaning towards ignoring this feature in favor of his library pandas' current hacky NA support. Members of the neuroimaging crowd are saying that the memory overhead is too high and the benefits too marginal, so they'll stick with NaNs. Together these folk a huge proportion of the this feature's target audience. So what have we actually accomplished by merging this to mainline? Are we going to be stuck supporting a feature that only a fraction of the target audience actually uses? (Maybe they're being dumb, but if people are ignoring your code for dumb reasons... they're still ignoring your code.)
The consensus rule forces everyone to do the hardest and riskiest part -- building buy-in -- up front. Because you *have* to do it sooner or later, and doing it sooner doesn't just generate better designs. It drastically reduces the risk of ending up in a huge trainwreck.
--
In my story at the beginning, I wished I had a magic wand to skip this annoying debate and political stuff. But giving it to me would have been a bad idea. I think that's went wrong with the NA discussion in the first place. Mark's an excellent programmer, and he tried his best to act in the good of everyone in the project -- but in the end, he did have a wand like that. He didn't have that sense that he *had* to get everyone on board (even the people who were saying dumb things), or he'd just be wasting his time. He didn't ask Pierre if the NA design would actually work for numpy.ma's purposes -- I did.
You may have noticed that I do have some ideas for about how NA support should work. But my ideas aren't really the important thing. The alter-NEP was my attempt to find common ground between the different needs people were bringing up, so we could discuss whether it would work for people or not. I'm not wedded to anything in it. But this is a complicated issue with a lot of conflicting interests, and we need to find something that actually does work for everyone (or as large a subset as is practical).
So here's what I think we should do: 1) I will submit a pull request backing Mark's NA work out of mainline, for now. (This is more or less done, I just need to get it onto github, see above re: connectivity) 2) I will also put together a new branch containing that work, rebased against current mainline, so it doesn't get lost. (Ditto.) 3) And we'll decide what to do with it *after* we hammer out a design that the various NA-supporting groups all find convincing. Or at least a design for some of the less controversial pieces (like the 'where=' ufunc argument?), get those merged, and then iterate incrementally.
What do you all think?
Why don't you and Matthew work up an alternative implementation so we can compare the two? Chuck
Hi, On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant <oliphant@enthought.com> wrote:
I think Nathaniel and Matthew provided very specific feedback that was helpful in understanding other perspectives of a difficult problem. In particular, I really wanted bit-patterns implemented. However, I also understand that Mark did quite a bit of work and altered his original designs quite a bit in response to community feedback. I wasn't a major part of the pull request discussion, nor did I merge the changes, but I support Charles if he reviewed the code and felt like it was the right thing to do. I likely would have done the same thing rather than let Mark Wiebe's work languish.
My connectivity is spotty this week, so I'll stay out of the technical discussion for now, but I want to share a story.
Maybe a year ago now, Jonathan Taylor and I were debating what the best API for describing statistical models would be -- whether we wanted something like R's "formulas" (which I supported), or another approach based on sympy (his idea). To summarize, I thought his API was confusing, pointlessly complicated, and didn't actually solve the problem; he thought R-style formulas were superficially simpler but hopelessly confused and inconsistent underneath. Now, obviously, I was right and he was wrong. Well, obvious to me, anyway... ;-) But it wasn't like I could just wave a wand and make his arguments go away, no matter how annoying and wrong-headed I thought they were... I could write all the code I wanted but no-one would use it unless I could convince them it's actually the right solution, so I had to engage with him, and dig deep into his arguments.
What I discovered was that (as I thought) R-style formulas *do* have a solid theoretical basis -- but (as he thought) all the existing implementations *are* broken and inconsistent! I'm still not sure I can actually convince Jonathan to go my way, but, because of his stubbornness, I had to invent a better way of handling these formulas, and so my library[1] is actually the first implementation of these things that has a rigorous theory behind it, and in the process it avoids two fundamental, decades-old bugs in R. (And I'm not sure the R folks can fix either of them at this point without breaking a ton of code, since they both have API consequences.)
--
It's extremely common for healthy FOSS projects to insist on consensus for almost all decisions, where consensus means something like "every interested party has a veto"[2]. This seems counterintuitive, because if everyone's vetoing all the time, how does anything get done? The trick is that if anyone *can* veto, then vetoes turn out to actually be very rare. Everyone knows that they can't just ignore alternative points of view -- they have to engage with them if they want to get anything done. So you get buy-in on features early, and no vetoes are necessary. And by forcing people to engage with each other, like me with Jonathan, you get better designs.
But what about the cost of all that code that doesn't get merged, or written, because everyone's spending all this time debating instead? Better designs are nice and all, but how does that justify letting working code languish?
The greatest risk for a FOSS project is that people will ignore you. Projects and features live and die by community buy-in. Consider the "NA mask" feature right now. It works (at least the parts of it that are implemented). It's in mainline. But IIRC, Pierre said last time that he doesn't think the current design will help him improve or replace numpy.ma. Up-thread, Wes McKinney is leaning towards ignoring this feature in favor of his library pandas' current hacky NA support. Members of the neuroimaging crowd are saying that the memory overhead is too high and the benefits too marginal, so they'll stick with NaNs. Together these folk a huge proportion of the this feature's target audience. So what have we actually accomplished by merging this to mainline? Are we going to be stuck supporting a feature that only a fraction of the target audience actually uses? (Maybe they're being dumb, but if people are ignoring your code for dumb reasons... they're still ignoring your code.)
The consensus rule forces everyone to do the hardest and riskiest part -- building buy-in -- up front. Because you *have* to do it sooner or later, and doing it sooner doesn't just generate better designs. It drastically reduces the risk of ending up in a huge trainwreck.
--
In my story at the beginning, I wished I had a magic wand to skip this annoying debate and political stuff. But giving it to me would have been a bad idea. I think that's went wrong with the NA discussion in the first place. Mark's an excellent programmer, and he tried his best to act in the good of everyone in the project -- but in the end, he did have a wand like that. He didn't have that sense that he *had* to get everyone on board (even the people who were saying dumb things), or he'd just be wasting his time. He didn't ask Pierre if the NA design would actually work for numpy.ma's purposes -- I did.
You may have noticed that I do have some ideas for about how NA support should work. But my ideas aren't really the important thing. The alter-NEP was my attempt to find common ground between the different needs people were bringing up, so we could discuss whether it would work for people or not. I'm not wedded to anything in it. But this is a complicated issue with a lot of conflicting interests, and we need to find something that actually does work for everyone (or as large a subset as is practical).
So here's what I think we should do: 1) I will submit a pull request backing Mark's NA work out of mainline, for now. (This is more or less done, I just need to get it onto github, see above re: connectivity) 2) I will also put together a new branch containing that work, rebased against current mainline, so it doesn't get lost. (Ditto.) 3) And we'll decide what to do with it *after* we hammer out a design that the various NA-supporting groups all find convincing. Or at least a design for some of the less controversial pieces (like the 'where=' ufunc argument?), get those merged, and then iterate incrementally.
What do you all think?
Why don't you and Matthew work up an alternative implementation so we can compare the two?
Do you have comments on the changes I suggested? Best, Matthew
Hi, On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant <oliphant@enthought.com> wrote:
I think Nathaniel and Matthew provided very specific feedback that was helpful in understanding other perspectives of a difficult problem. In particular, I really wanted bit-patterns implemented. However, I also understand that Mark did quite a bit of work and altered his original designs quite a bit in response to community feedback. I wasn't a major part of the pull request discussion, nor did I merge the changes, but I support Charles if he reviewed the code and felt like it was the right thing to do. I likely would have done the same thing rather than let Mark Wiebe's work languish.
My connectivity is spotty this week, so I'll stay out of the technical discussion for now, but I want to share a story.
Maybe a year ago now, Jonathan Taylor and I were debating what the best API for describing statistical models would be -- whether we wanted something like R's "formulas" (which I supported), or another approach based on sympy (his idea). To summarize, I thought his API was confusing, pointlessly complicated, and didn't actually solve the problem; he thought R-style formulas were superficially simpler but hopelessly confused and inconsistent underneath. Now, obviously, I was right and he was wrong. Well, obvious to me, anyway... ;-) But it wasn't like I could just wave a wand and make his arguments go away, no matter how annoying and wrong-headed I thought they were... I could write all the code I wanted but no-one would use it unless I could convince them it's actually the right solution, so I had to engage with him, and dig deep into his arguments.
What I discovered was that (as I thought) R-style formulas *do* have a solid theoretical basis -- but (as he thought) all the existing implementations *are* broken and inconsistent! I'm still not sure I can actually convince Jonathan to go my way, but, because of his stubbornness, I had to invent a better way of handling these formulas, and so my library[1] is actually the first implementation of these things that has a rigorous theory behind it, and in the process it avoids two fundamental, decades-old bugs in R. (And I'm not sure the R folks can fix either of them at this point without breaking a ton of code, since they both have API consequences.)
--
It's extremely common for healthy FOSS projects to insist on consensus for almost all decisions, where consensus means something like "every interested party has a veto"[2]. This seems counterintuitive, because if everyone's vetoing all the time, how does anything get done? The trick is that if anyone *can* veto, then vetoes turn out to actually be very rare. Everyone knows that they can't just ignore alternative points of view -- they have to engage with them if they want to get anything done. So you get buy-in on features early, and no vetoes are necessary. And by forcing people to engage with each other, like me with Jonathan, you get better designs.
But what about the cost of all that code that doesn't get merged, or written, because everyone's spending all this time debating instead? Better designs are nice and all, but how does that justify letting working code languish?
The greatest risk for a FOSS project is that people will ignore you. Projects and features live and die by community buy-in. Consider the "NA mask" feature right now. It works (at least the parts of it that are implemented). It's in mainline. But IIRC, Pierre said last time that he doesn't think the current design will help him improve or replace numpy.ma. Up-thread, Wes McKinney is leaning towards ignoring this feature in favor of his library pandas' current hacky NA support. Members of the neuroimaging crowd are saying that the memory overhead is too high and the benefits too marginal, so they'll stick with NaNs. Together these folk a huge proportion of the this feature's target audience. So what have we actually accomplished by merging this to mainline? Are we going to be stuck supporting a feature that only a fraction of the target audience actually uses? (Maybe they're being dumb, but if people are ignoring your code for dumb reasons... they're still ignoring your code.)
The consensus rule forces everyone to do the hardest and riskiest part -- building buy-in -- up front. Because you *have* to do it sooner or later, and doing it sooner doesn't just generate better designs. It drastically reduces the risk of ending up in a huge trainwreck.
--
In my story at the beginning, I wished I had a magic wand to skip this annoying debate and political stuff. But giving it to me would have been a bad idea. I think that's went wrong with the NA discussion in the first place. Mark's an excellent programmer, and he tried his best to act in the good of everyone in the project -- but in the end, he did have a wand like that. He didn't have that sense that he *had* to get everyone on board (even the people who were saying dumb things), or he'd just be wasting his time. He didn't ask Pierre if the NA design would actually work for numpy.ma's purposes -- I did.
You may have noticed that I do have some ideas for about how NA support should work. But my ideas aren't really the important thing. The alter-NEP was my attempt to find common ground between the different needs people were bringing up, so we could discuss whether it would work for people or not. I'm not wedded to anything in it. But this is a complicated issue with a lot of conflicting interests, and we need to find something that actually does work for everyone (or as large a subset as is practical).
So here's what I think we should do: 1) I will submit a pull request backing Mark's NA work out of mainline, for now. (This is more or less done, I just need to get it onto github, see above re: connectivity) 2) I will also put together a new branch containing that work, rebased against current mainline, so it doesn't get lost. (Ditto.) 3) And we'll decide what to do with it *after* we hammer out a design that the various NA-supporting groups all find convincing. Or at least a design for some of the less controversial pieces (like the 'where=' ufunc argument?), get those merged, and then iterate incrementally.
What do you all think?
Why don't you and Matthew work up an alternative implementation so we can compare the two?
Do you have comments on the changes I suggested?
Sorry - this was too short and a little rude. I'm sorry. I was reacting to what I perceived as intolerance for discussing the issues, and I may be wrong in that perception. I think what Nathaniel is saying, is that it is not in the best interests of numpy to push through code where there is not good agreement. In reverting the change, he is, I think, appealing for a commitment to that process, for the good of numpy. I have in the past taken some of your remarks to imply that if someone is prepared to write code then that overrides most potential disagreement. The reason I think Nathaniel is the more right, is because most of us, I believe, do honestly have the interests of numpy at heart, and, want to fully understand the problem, and are prepared to be proven wrong. In that situation, in my experience of writing code at least, by far the most fruitful way to proceed is by letting all voices be heard. On the other hand, if the rule becomes 'unless I see an implementation I'm not listening to you' - then we lose the great benefits, to the code, of having what is fundamentally a good and strong community. Best, Matthew
On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett <matthew.brett@gmail.com>wrote:
Hi,
Hi,
On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant <
wrote:
I think Nathaniel and Matthew provided very specific feedback that was helpful in understanding other
On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett <matthew.brett@gmail.com> wrote: oliphant@enthought.com> perspectives
of a difficult problem. In particular, I really wanted bit-patterns implemented. However, I also understand that Mark did quite a bit of work and altered his original designs quite a bit in response to community feedback. I wasn't a major part of the pull request discussion, nor did I merge the changes, but I support Charles if he reviewed the code and felt like it was the right thing to do. I likely would have done the same thing rather than let Mark Wiebe's work languish.
My connectivity is spotty this week, so I'll stay out of the technical discussion for now, but I want to share a story.
Maybe a year ago now, Jonathan Taylor and I were debating what the best API for describing statistical models would be -- whether we wanted something like R's "formulas" (which I supported), or another approach based on sympy (his idea). To summarize, I thought his API was confusing, pointlessly complicated, and didn't actually solve the problem; he thought R-style formulas were superficially simpler but hopelessly confused and inconsistent underneath. Now, obviously, I was right and he was wrong. Well, obvious to me, anyway... ;-) But it wasn't like I could just wave a wand and make his arguments go away, no matter how annoying and wrong-headed I thought they were... I could write all the code I wanted but no-one would use it unless I could convince them it's actually the right solution, so I had to engage with him, and dig deep into his arguments.
What I discovered was that (as I thought) R-style formulas *do* have a solid theoretical basis -- but (as he thought) all the existing implementations *are* broken and inconsistent! I'm still not sure I can actually convince Jonathan to go my way, but, because of his stubbornness, I had to invent a better way of handling these formulas, and so my library[1] is actually the first implementation of these things that has a rigorous theory behind it, and in the process it avoids two fundamental, decades-old bugs in R. (And I'm not sure the R folks can fix either of them at this point without breaking a ton of code, since they both have API consequences.)
--
It's extremely common for healthy FOSS projects to insist on consensus for almost all decisions, where consensus means something like "every interested party has a veto"[2]. This seems counterintuitive, because if everyone's vetoing all the time, how does anything get done? The trick is that if anyone *can* veto, then vetoes turn out to actually be very rare. Everyone knows that they can't just ignore alternative points of view -- they have to engage with them if they want to get anything done. So you get buy-in on features early, and no vetoes are necessary. And by forcing people to engage with each other, like me with Jonathan, you get better designs.
But what about the cost of all that code that doesn't get merged, or written, because everyone's spending all this time debating instead? Better designs are nice and all, but how does that justify letting working code languish?
The greatest risk for a FOSS project is that people will ignore you. Projects and features live and die by community buy-in. Consider the "NA mask" feature right now. It works (at least the parts of it that are implemented). It's in mainline. But IIRC, Pierre said last time that he doesn't think the current design will help him improve or replace numpy.ma. Up-thread, Wes McKinney is leaning towards ignoring this feature in favor of his library pandas' current hacky NA support. Members of the neuroimaging crowd are saying that the memory overhead is too high and the benefits too marginal, so they'll stick with NaNs. Together these folk a huge proportion of the this feature's target audience. So what have we actually accomplished by merging this to mainline? Are we going to be stuck supporting a feature that only a fraction of the target audience actually uses? (Maybe they're being dumb, but if people are ignoring your code for dumb reasons... they're still ignoring your code.)
The consensus rule forces everyone to do the hardest and riskiest part -- building buy-in -- up front. Because you *have* to do it sooner or later, and doing it sooner doesn't just generate better designs. It drastically reduces the risk of ending up in a huge trainwreck.
--
In my story at the beginning, I wished I had a magic wand to skip this annoying debate and political stuff. But giving it to me would have been a bad idea. I think that's went wrong with the NA discussion in the first place. Mark's an excellent programmer, and he tried his best to act in the good of everyone in the project -- but in the end, he did have a wand like that. He didn't have that sense that he *had* to get everyone on board (even the people who were saying dumb things), or he'd just be wasting his time. He didn't ask Pierre if the NA design would actually work for numpy.ma's purposes -- I did.
You may have noticed that I do have some ideas for about how NA support should work. But my ideas aren't really the important thing. The alter-NEP was my attempt to find common ground between the different needs people were bringing up, so we could discuss whether it would work for people or not. I'm not wedded to anything in it. But this is a complicated issue with a lot of conflicting interests, and we need to find something that actually does work for everyone (or as large a subset as is practical).
So here's what I think we should do: 1) I will submit a pull request backing Mark's NA work out of mainline, for now. (This is more or less done, I just need to get it onto github, see above re: connectivity) 2) I will also put together a new branch containing that work, rebased against current mainline, so it doesn't get lost. (Ditto.) 3) And we'll decide what to do with it *after* we hammer out a design that the various NA-supporting groups all find convincing. Or at least a design for some of the less controversial pieces (like the 'where=' ufunc argument?), get those merged, and then iterate incrementally.
What do you all think?
Why don't you and Matthew work up an alternative implementation so we can compare the two?
Do you have comments on the changes I suggested?
Sorry - this was too short and a little rude. I'm sorry.
I was reacting to what I perceived as intolerance for discussing the issues, and I may be wrong in that perception.
I think what Nathaniel is saying, is that it is not in the best interests of numpy to push through code where there is not good agreement. In reverting the change, he is, I think, appealing for a commitment to that process, for the good of numpy.
I have in the past taken some of your remarks to imply that if someone is prepared to write code then that overrides most potential disagreement.
The reason I think Nathaniel is the more right, is because most of us, I believe, do honestly have the interests of numpy at heart, and, want to fully understand the problem, and are prepared to be proven wrong. In that situation, in my experience of writing code at least, by far the most fruitful way to proceed is by letting all voices be heard. On the other hand, if the rule becomes 'unless I see an implementation I'm not listening to you' - then we lose the great benefits, to the code, of having what is fundamentally a good and strong community.
Matthew, the problem I have is that it seems that you and Nathaniel won't be satisfied unless things are done *your* way. To use your terminology, that comes across as a lack of respect for the rest of us. In order to reach consensus, some folks are going to have to give. I think Mark gave a lot, I don't see that from the two of you. Wanting reversion at this point, even when Nathaniel doesn't seem to have used the current implementation much -- if any -- might be considered arrogant by some. Asking that you put some skin in the game by devoting substantial time to an alternate implementation doesn't strike me as out of line. Chuck
Hi, On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant <oliphant@enthought.com> wrote:
I think Nathaniel and Matthew provided very specific feedback that was helpful in understanding other perspectives of a difficult problem. In particular, I really wanted bit-patterns implemented. However, I also understand that Mark did quite a bit of work and altered his original designs quite a bit in response to community feedback. I wasn't a major part of the pull request discussion, nor did I merge the changes, but I support Charles if he reviewed the code and felt like it was the right thing to do. I likely would have done the same thing rather than let Mark Wiebe's work languish.
My connectivity is spotty this week, so I'll stay out of the technical discussion for now, but I want to share a story.
Maybe a year ago now, Jonathan Taylor and I were debating what the best API for describing statistical models would be -- whether we wanted something like R's "formulas" (which I supported), or another approach based on sympy (his idea). To summarize, I thought his API was confusing, pointlessly complicated, and didn't actually solve the problem; he thought R-style formulas were superficially simpler but hopelessly confused and inconsistent underneath. Now, obviously, I was right and he was wrong. Well, obvious to me, anyway... ;-) But it wasn't like I could just wave a wand and make his arguments go away, no matter how annoying and wrong-headed I thought they were... I could write all the code I wanted but no-one would use it unless I could convince them it's actually the right solution, so I had to engage with him, and dig deep into his arguments.
What I discovered was that (as I thought) R-style formulas *do* have a solid theoretical basis -- but (as he thought) all the existing implementations *are* broken and inconsistent! I'm still not sure I can actually convince Jonathan to go my way, but, because of his stubbornness, I had to invent a better way of handling these formulas, and so my library[1] is actually the first implementation of these things that has a rigorous theory behind it, and in the process it avoids two fundamental, decades-old bugs in R. (And I'm not sure the R folks can fix either of them at this point without breaking a ton of code, since they both have API consequences.)
--
It's extremely common for healthy FOSS projects to insist on consensus for almost all decisions, where consensus means something like "every interested party has a veto"[2]. This seems counterintuitive, because if everyone's vetoing all the time, how does anything get done? The trick is that if anyone *can* veto, then vetoes turn out to actually be very rare. Everyone knows that they can't just ignore alternative points of view -- they have to engage with them if they want to get anything done. So you get buy-in on features early, and no vetoes are necessary. And by forcing people to engage with each other, like me with Jonathan, you get better designs.
But what about the cost of all that code that doesn't get merged, or written, because everyone's spending all this time debating instead? Better designs are nice and all, but how does that justify letting working code languish?
The greatest risk for a FOSS project is that people will ignore you. Projects and features live and die by community buy-in. Consider the "NA mask" feature right now. It works (at least the parts of it that are implemented). It's in mainline. But IIRC, Pierre said last time that he doesn't think the current design will help him improve or replace numpy.ma. Up-thread, Wes McKinney is leaning towards ignoring this feature in favor of his library pandas' current hacky NA support. Members of the neuroimaging crowd are saying that the memory overhead is too high and the benefits too marginal, so they'll stick with NaNs. Together these folk a huge proportion of the this feature's target audience. So what have we actually accomplished by merging this to mainline? Are we going to be stuck supporting a feature that only a fraction of the target audience actually uses? (Maybe they're being dumb, but if people are ignoring your code for dumb reasons... they're still ignoring your code.)
The consensus rule forces everyone to do the hardest and riskiest part -- building buy-in -- up front. Because you *have* to do it sooner or later, and doing it sooner doesn't just generate better designs. It drastically reduces the risk of ending up in a huge trainwreck.
--
In my story at the beginning, I wished I had a magic wand to skip this annoying debate and political stuff. But giving it to me would have been a bad idea. I think that's went wrong with the NA discussion in the first place. Mark's an excellent programmer, and he tried his best to act in the good of everyone in the project -- but in the end, he did have a wand like that. He didn't have that sense that he *had* to get everyone on board (even the people who were saying dumb things), or he'd just be wasting his time. He didn't ask Pierre if the NA design would actually work for numpy.ma's purposes -- I did.
You may have noticed that I do have some ideas for about how NA support should work. But my ideas aren't really the important thing. The alter-NEP was my attempt to find common ground between the different needs people were bringing up, so we could discuss whether it would work for people or not. I'm not wedded to anything in it. But this is a complicated issue with a lot of conflicting interests, and we need to find something that actually does work for everyone (or as large a subset as is practical).
So here's what I think we should do: 1) I will submit a pull request backing Mark's NA work out of mainline, for now. (This is more or less done, I just need to get it onto github, see above re: connectivity) 2) I will also put together a new branch containing that work, rebased against current mainline, so it doesn't get lost. (Ditto.) 3) And we'll decide what to do with it *after* we hammer out a design that the various NA-supporting groups all find convincing. Or at least a design for some of the less controversial pieces (like the 'where=' ufunc argument?), get those merged, and then iterate incrementally.
What do you all think?
Why don't you and Matthew work up an alternative implementation so we can compare the two?
Do you have comments on the changes I suggested?
Sorry - this was too short and a little rude. I'm sorry.
I was reacting to what I perceived as intolerance for discussing the issues, and I may be wrong in that perception.
I think what Nathaniel is saying, is that it is not in the best interests of numpy to push through code where there is not good agreement. In reverting the change, he is, I think, appealing for a commitment to that process, for the good of numpy.
I have in the past taken some of your remarks to imply that if someone is prepared to write code then that overrides most potential disagreement.
The reason I think Nathaniel is the more right, is because most of us, I believe, do honestly have the interests of numpy at heart, and, want to fully understand the problem, and are prepared to be proven wrong. In that situation, in my experience of writing code at least, by far the most fruitful way to proceed is by letting all voices be heard. On the other hand, if the rule becomes 'unless I see an implementation I'm not listening to you' - then we lose the great benefits, to the code, of having what is fundamentally a good and strong community.
Matthew, the problem I have is that it seems that you and Nathaniel won't be satisfied unless things are done *your* way. To use your terminology, that comes across as a lack of respect for the rest of us. In order to reach consensus, some folks are going to have to give.
No, that's not what Nathaniel and I are saying at all. Nathaniel was pointing to links for projects that care that everyone agrees before they go ahead. In saying that we are insisting on our way, you are saying, implicitly, 'I am not going to negotiate'. What Nathaniel is asking for (I agree) - is a commitment to negotiate to agreement when there is substantial disagreement. Best, Matthew
On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett <matthew.brett@gmail.com>wrote:
Hi,
On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett <matthew.brett@gmail.com
wrote:
Hi,
On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith <njs@pobox.com>
On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant <oliphant@enthought.com> wrote: > I think Nathaniel and Matthew provided very > specific feedback that was helpful in understanding other > perspectives > of a > difficult problem. In particular, I really wanted bit-patterns > implemented. However, I also understand that Mark did quite a
bit
> of > work > and altered his original designs quite a bit in response to > community > feedback. I wasn't a major part of the pull request discussion, > nor > did I > merge the changes, but I support Charles if he reviewed the code and > felt > like it was the right thing to do. I likely would have done the > same > thing > rather than let Mark Wiebe's work languish.
My connectivity is spotty this week, so I'll stay out of the technical discussion for now, but I want to share a story.
Maybe a year ago now, Jonathan Taylor and I were debating what the best API for describing statistical models would be -- whether we wanted something like R's "formulas" (which I supported), or another approach based on sympy (his idea). To summarize, I thought his API was confusing, pointlessly complicated, and didn't actually solve
problem; he thought R-style formulas were superficially simpler but hopelessly confused and inconsistent underneath. Now, obviously, I was right and he was wrong. Well, obvious to me, anyway... ;-) But it wasn't like I could just wave a wand and make his arguments go away, no matter how annoying and wrong-headed I thought they were... I could write all the code I wanted but no-one would use it unless I could convince them it's actually the right solution, so I had to engage with him, and dig deep into his arguments.
What I discovered was that (as I thought) R-style formulas *do* have a solid theoretical basis -- but (as he thought) all the existing implementations *are* broken and inconsistent! I'm still not sure I can actually convince Jonathan to go my way, but, because of his stubbornness, I had to invent a better way of handling these
and so my library[1] is actually the first implementation of these things that has a rigorous theory behind it, and in the process it avoids two fundamental, decades-old bugs in R. (And I'm not sure the R folks can fix either of them at this point without breaking a ton of code, since they both have API consequences.)
--
It's extremely common for healthy FOSS projects to insist on consensus for almost all decisions, where consensus means something like "every interested party has a veto"[2]. This seems counterintuitive, because if everyone's vetoing all the time, how does anything get done? The trick is that if anyone *can* veto, then vetoes turn out to actually be very rare. Everyone knows that they can't just ignore alternative points of view -- they have to engage with them if they want to get anything done. So you get buy-in on features early, and no vetoes are necessary. And by forcing people to engage with each other, like me with Jonathan, you get better designs.
But what about the cost of all that code that doesn't get merged, or written, because everyone's spending all this time debating instead? Better designs are nice and all, but how does that justify letting working code languish?
The greatest risk for a FOSS project is that people will ignore you. Projects and features live and die by community buy-in. Consider the "NA mask" feature right now. It works (at least the parts of it that are implemented). It's in mainline. But IIRC, Pierre said last time that he doesn't think the current design will help him improve or replace numpy.ma. Up-thread, Wes McKinney is leaning towards ignoring this feature in favor of his library pandas' current hacky NA support. Members of the neuroimaging crowd are saying that the memory overhead is too high and the benefits too marginal, so they'll stick with NaNs. Together these folk a huge proportion of the this feature's target audience. So what have we actually accomplished by merging this to mainline? Are we going to be stuck supporting a feature that only a fraction of the target audience actually uses? (Maybe they're being dumb, but if people are ignoring your code for dumb reasons...
still ignoring your code.)
The consensus rule forces everyone to do the hardest and riskiest
-- building buy-in -- up front. Because you *have* to do it sooner or later, and doing it sooner doesn't just generate better designs. It drastically reduces the risk of ending up in a huge trainwreck.
--
In my story at the beginning, I wished I had a magic wand to skip
annoying debate and political stuff. But giving it to me would have been a bad idea. I think that's went wrong with the NA discussion in the first place. Mark's an excellent programmer, and he tried his best to act in the good of everyone in the project -- but in the end, he did have a wand like that. He didn't have that sense that he *had* to get everyone on board (even the people who were saying dumb things), or he'd just be wasting his time. He didn't ask Pierre if the NA design would actually work for numpy.ma's purposes -- I did.
You may have noticed that I do have some ideas for about how NA support should work. But my ideas aren't really the important thing. The alter-NEP was my attempt to find common ground between the different needs people were bringing up, so we could discuss whether it would work for people or not. I'm not wedded to anything in it. But this is a complicated issue with a lot of conflicting interests, and we need to find something that actually does work for everyone (or as large a subset as is practical).
So here's what I think we should do: 1) I will submit a pull request backing Mark's NA work out of mainline, for now. (This is more or less done, I just need to get it onto github, see above re: connectivity) 2) I will also put together a new branch containing that work, rebased against current mainline, so it doesn't get lost. (Ditto.) 3) And we'll decide what to do with it *after* we hammer out a design that the various NA-supporting groups all find convincing. Or at least a design for some of the less controversial pieces (like
'where=' ufunc argument?), get those merged, and then iterate incrementally.
What do you all think?
Why don't you and Matthew work up an alternative implementation so we can compare the two?
Do you have comments on the changes I suggested?
Sorry - this was too short and a little rude. I'm sorry.
I was reacting to what I perceived as intolerance for discussing the issues, and I may be wrong in that perception.
I think what Nathaniel is saying, is that it is not in the best interests of numpy to push through code where there is not good agreement. In reverting the change, he is, I think, appealing for a commitment to that process, for the good of numpy.
I have in the past taken some of your remarks to imply that if someone is prepared to write code then that overrides most potential disagreement.
The reason I think Nathaniel is the more right, is because most of us, I believe, do honestly have the interests of numpy at heart, and, want to fully understand the problem, and are prepared to be proven wrong. In that situation, in my experience of writing code at least, by far the most fruitful way to proceed is by letting all voices be heard. On the other hand, if the rule becomes 'unless I see an implementation I'm not listening to you' - then we lose the great benefits, to the code, of having what is fundamentally a good and strong community.
Matthew, the problem I have is that it seems that you and Nathaniel won't be satisfied unless things are done *your* way. To use your terminology,
wrote: the formulas, they're part this the that
comes across as a lack of respect for the rest of us. In order to reach consensus, some folks are going to have to give.
No, that's not what Nathaniel and I are saying at all. Nathaniel was pointing to links for projects that care that everyone agrees before they go ahead.
It looked to me like there was a serious intent to come to an agreement, or at least closer together. The discussion in the summer was going around in circles though, and was too abstract and complex to follow. Therefore Mark's choice of implementing something and then asking for feedback made sense to me.
In saying that we are insisting on our way, you are saying, implicitly, 'I am not going to negotiate'.
What Nathaniel is asking for (I agree) - is a commitment to negotiate to agreement when there is substantial disagreement.
That commitment would of course be good. However, even if that were
That is only your interpretation. The observation that Mark compromised quite a bit while you didn't seems largely correct to me. possible before writing code and everyone agreed that the ideas of you and Nathaniel should be implemented in full, it's still not clear that either of you would be willing to write any code. Agreement without code still doesn't help us very much. Ralf
Hi, On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith <njs@pobox.com> wrote: > > On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant > <oliphant@enthought.com> > wrote: > > I think Nathaniel and Matthew provided very > > specific feedback that was helpful in understanding other > > perspectives > > of a > > difficult problem. In particular, I really wanted > > bit-patterns > > implemented. However, I also understand that Mark did quite a > > bit > > of > > work > > and altered his original designs quite a bit in response to > > community > > feedback. I wasn't a major part of the pull request discussion, > > nor > > did I > > merge the changes, but I support Charles if he reviewed the code > > and > > felt > > like it was the right thing to do. I likely would have done the > > same > > thing > > rather than let Mark Wiebe's work languish. > > My connectivity is spotty this week, so I'll stay out of the > technical > discussion for now, but I want to share a story. > > Maybe a year ago now, Jonathan Taylor and I were debating what the > best API for describing statistical models would be -- whether we > wanted something like R's "formulas" (which I supported), or > another > approach based on sympy (his idea). To summarize, I thought his API > was confusing, pointlessly complicated, and didn't actually solve > the > problem; he thought R-style formulas were superficially simpler but > hopelessly confused and inconsistent underneath. Now, obviously, I > was > right and he was wrong. Well, obvious to me, anyway... ;-) But it > wasn't like I could just wave a wand and make his arguments go > away, > no matter how annoying and wrong-headed I thought they were... I > could > write all the code I wanted but no-one would use it unless I could > convince them it's actually the right solution, so I had to engage > with him, and dig deep into his arguments. > > What I discovered was that (as I thought) R-style formulas *do* > have a > solid theoretical basis -- but (as he thought) all the existing > implementations *are* broken and inconsistent! I'm still not sure I > can actually convince Jonathan to go my way, but, because of his > stubbornness, I had to invent a better way of handling these > formulas, > and so my library[1] is actually the first implementation of these > things that has a rigorous theory behind it, and in the process it > avoids two fundamental, decades-old bugs in R. (And I'm not sure > the R > folks can fix either of them at this point without breaking a ton > of > code, since they both have API consequences.) > > -- > > It's extremely common for healthy FOSS projects to insist on > consensus > for almost all decisions, where consensus means something like > "every > interested party has a veto"[2]. This seems counterintuitive, > because > if everyone's vetoing all the time, how does anything get done? The > trick is that if anyone *can* veto, then vetoes turn out to > actually > be very rare. Everyone knows that they can't just ignore > alternative > points of view -- they have to engage with them if they want to get > anything done. So you get buy-in on features early, and no vetoes > are > necessary. And by forcing people to engage with each other, like me > with Jonathan, you get better designs. > > But what about the cost of all that code that doesn't get merged, > or > written, because everyone's spending all this time debating > instead? > Better designs are nice and all, but how does that justify letting > working code languish? > > The greatest risk for a FOSS project is that people will ignore > you. > Projects and features live and die by community buy-in. Consider > the > "NA mask" feature right now. It works (at least the parts of it > that > are implemented). It's in mainline. But IIRC, Pierre said last time > that he doesn't think the current design will help him improve or > replace numpy.ma. Up-thread, Wes McKinney is leaning towards > ignoring > this feature in favor of his library pandas' current hacky NA > support. > Members of the neuroimaging crowd are saying that the memory > overhead > is too high and the benefits too marginal, so they'll stick with > NaNs. > Together these folk a huge proportion of the this feature's target > audience. So what have we actually accomplished by merging this to > mainline? Are we going to be stuck supporting a feature that only a > fraction of the target audience actually uses? (Maybe they're being > dumb, but if people are ignoring your code for dumb reasons... > they're > still ignoring your code.) > > The consensus rule forces everyone to do the hardest and riskiest > part > -- building buy-in -- up front. Because you *have* to do it sooner > or > later, and doing it sooner doesn't just generate better designs. It > drastically reduces the risk of ending up in a huge trainwreck. > > -- > > In my story at the beginning, I wished I had a magic wand to skip > this > annoying debate and political stuff. But giving it to me would have > been a bad idea. I think that's went wrong with the NA discussion > in > the first place. Mark's an excellent programmer, and he tried his > best > to act in the good of everyone in the project -- but in the end, he > did have a wand like that. He didn't have that sense that he *had* > to > get everyone on board (even the people who were saying dumb > things), > or he'd just be wasting his time. He didn't ask Pierre if the NA > design would actually work for numpy.ma's purposes -- I did. > > You may have noticed that I do have some ideas for about how NA > support should work. But my ideas aren't really the important > thing. > The alter-NEP was my attempt to find common ground between the > different needs people were bringing up, so we could discuss > whether > it would work for people or not. I'm not wedded to anything in it. > But > this is a complicated issue with a lot of conflicting interests, > and > we need to find something that actually does work for everyone (or > as > large a subset as is practical). > > So here's what I think we should do: > 1) I will submit a pull request backing Mark's NA work out of > mainline, for now. (This is more or less done, I just need to get > it > onto github, see above re: connectivity) > 2) I will also put together a new branch containing that work, > rebased against current mainline, so it doesn't get lost. (Ditto.) > 3) And we'll decide what to do with it *after* we hammer out a > design that the various NA-supporting groups all find convincing. > Or > at least a design for some of the less controversial pieces (like > the > 'where=' ufunc argument?), get those merged, and then iterate > incrementally. > > What do you all think? >
Why don't you and Matthew work up an alternative implementation so we can compare the two?
Do you have comments on the changes I suggested?
Sorry - this was too short and a little rude. I'm sorry.
I was reacting to what I perceived as intolerance for discussing the issues, and I may be wrong in that perception.
I think what Nathaniel is saying, is that it is not in the best interests of numpy to push through code where there is not good agreement. In reverting the change, he is, I think, appealing for a commitment to that process, for the good of numpy.
I have in the past taken some of your remarks to imply that if someone is prepared to write code then that overrides most potential disagreement.
The reason I think Nathaniel is the more right, is because most of us, I believe, do honestly have the interests of numpy at heart, and, want to fully understand the problem, and are prepared to be proven wrong. In that situation, in my experience of writing code at least, by far the most fruitful way to proceed is by letting all voices be heard. On the other hand, if the rule becomes 'unless I see an implementation I'm not listening to you' - then we lose the great benefits, to the code, of having what is fundamentally a good and strong community.
Matthew, the problem I have is that it seems that you and Nathaniel won't be satisfied unless things are done *your* way. To use your terminology, that comes across as a lack of respect for the rest of us. In order to reach consensus, some folks are going to have to give.
No, that's not what Nathaniel and I are saying at all. Nathaniel was pointing to links for projects that care that everyone agrees before they go ahead.
It looked to me like there was a serious intent to come to an agreement, or at least closer together. The discussion in the summer was going around in circles though, and was too abstract and complex to follow. Therefore Mark's choice of implementing something and then asking for feedback made sense to me.
I should point out that the implementation hasn't - as far as I can see - changed the discussion. The discussion was about the API. Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected.
In saying that we are insisting on our way, you are saying, implicitly, 'I am not going to negotiate'.
That is only your interpretation. The observation that Mark compromised quite a bit while you didn't seems largely correct to me.
The problem here stems from our inability to work towards agreement, rather than standing on set positions. I set out what changes I think would make the current implementation OK. Can we please, please have a discussion about those points instead of trying to argue about who has given more ground.
That commitment would of course be good. However, even if that were possible before writing code and everyone agreed that the ideas of you and Nathaniel should be implemented in full, it's still not clear that either of you would be willing to write any code. Agreement without code still doesn't help us very much.
I'm going to return to Nathaniel's point - it is a highly valuable thing to set ourselves the target of resolving substantial discussions by consensus. The route you are endorsing here is 'implementor wins'. We don't need to do it that way. We're a mature sensible bunch of adults who can talk out the issues until we agree they are ready for implementation, and then implement. That's all Nathaniel is saying. I think he's obviously right, and I'm sad that it isn't as clear to y'all as it is to me. Best, Matthew
Hi,
On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett <
matthew.brett@gmail.com>
wrote:
Hi,
On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris <charlesr.harris@gmail.com> wrote: > > > On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith <njs@pobox.com> > wrote: >> >> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant >> <oliphant@enthought.com> >> wrote: >> > I think Nathaniel and Matthew provided very >> > specific feedback that was helpful in understanding other >> > perspectives >> > of a >> > difficult problem. In particular, I really wanted >> > bit-patterns >> > implemented. However, I also understand that Mark did quite
a
>> > bit >> > of >> > work >> > and altered his original designs quite a bit in response to >> > community >> > feedback. I wasn't a major part of the pull request discussion, >> > nor >> > did I >> > merge the changes, but I support Charles if he reviewed the code >> > and >> > felt >> > like it was the right thing to do. I likely would have done
>> > same >> > thing >> > rather than let Mark Wiebe's work languish. >> >> My connectivity is spotty this week, so I'll stay out of the >> technical >> discussion for now, but I want to share a story. >> >> Maybe a year ago now, Jonathan Taylor and I were debating what
>> best API for describing statistical models would be -- whether we >> wanted something like R's "formulas" (which I supported), or >> another >> approach based on sympy (his idea). To summarize, I thought his API >> was confusing, pointlessly complicated, and didn't actually solve >> the >> problem; he thought R-style formulas were superficially simpler but >> hopelessly confused and inconsistent underneath. Now, obviously, I >> was >> right and he was wrong. Well, obvious to me, anyway... ;-) But it >> wasn't like I could just wave a wand and make his arguments go >> away, >> no I should point out that the implementation hasn't - as far as I can see - changed the discussion. The discussion was about the API. Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected.
In saying that we are insisting on our way, you are saying, implicitly, 'I am not going to negotiate'.
That is only your interpretation. The observation that Mark compromised quite a bit while you didn't seems largely correct to me.
The problem here stems from our inability to work towards agreement, rather than standing on set positions. I set out what changes I think would make the current implementation OK. Can we please, please have a discussion about those points instead of trying to argue about who has given more ground.
That commitment would of course be good. However, even if that were
On Friday, October 28, 2011, Matthew Brett <matthew.brett@gmail.com> wrote: the the possible
before writing code and everyone agreed that the ideas of you and Nathaniel should be implemented in full, it's still not clear that either of you would be willing to write any code. Agreement without code still doesn't help us very much.
I'm going to return to Nathaniel's point - it is a highly valuable thing to set ourselves the target of resolving substantial discussions by consensus. The route you are endorsing here is 'implementor wins'. We don't need to do it that way. We're a mature sensible bunch of adults who can talk out the issues until we agree they are ready for implementation, and then implement. That's all Nathaniel is saying. I think he's obviously right, and I'm sad that it isn't as clear to y'all as it is to me.
Best,
Matthew
Everyone, can we please not do this?! I had enough of adults doing finger pointing back over the summer during the whole debt ceiling debate. I think we can all agree that we are better than the US congress? Forget about rudeness or decision processes. I will start by saying that I am willing to separate ignore and absent, but only on the write side of things. On read, I want a single way to identify the missing values. I also want only a single way to perform calculations (either skip or propagate). An indicator of success would be that people stop using NaNs and magic numbers (-9999, anyone?) and we could even deprecate nansum(), or at least strongly suggest in its docs to use NA. Cheers! Ben Root
On Fri, Oct 28, 2011 at 7:53 PM, Benjamin Root <ben.root@ou.edu> wrote:
On Friday, October 28, 2011, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett <matthew.brett@gmail.com> wrote: > Hi, > > On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris > <charlesr.harris@gmail.com> wrote: >> >> >> On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith <njs@pobox.com> >> wrote: >>> >>> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant >>> <oliphant@enthought.com> >>> wrote: >>> > I think Nathaniel and Matthew provided very >>> > specific feedback that was helpful in understanding other >>> > perspectives >>> > of a >>> > difficult problem. In particular, I really wanted >>> > bit-patterns >>> > implemented. However, I also understand that Mark did quite >>> > a >>> > bit >>> > of >>> > work >>> > and altered his original designs quite a bit in response to >>> > community >>> > feedback. I wasn't a major part of the pull request >>> > discussion, >>> > nor >>> > did I >>> > merge the changes, but I support Charles if he reviewed the >>> > code >>> > and >>> > felt >>> > like it was the right thing to do. I likely would have done >>> > the >>> > same >>> > thing >>> > rather than let Mark Wiebe's work languish. >>> >>> My connectivity is spotty this week, so I'll stay out of the >>> technical >>> discussion for now, but I want to share a story. >>> >>> Maybe a year ago now, Jonathan Taylor and I were debating what >>> the >>> best API for describing statistical models would be -- whether we >>> wanted something like R's "formulas" (which I supported), or >>> another >>> approach based on sympy (his idea). To summarize, I thought his >>> API >>> was confusing, pointlessly complicated, and didn't actually solve >>> the >>> problem; he thought R-style formulas were superficially simpler >>> but >>> hopelessly confused and inconsistent underneath. Now, obviously, >>> I >>> was >>> right and he was wrong. Well, obvious to me, anyway... ;-) But it >>> wasn't like I could just wave a wand and make his arguments go >>> away, >>> no I should point out that the implementation hasn't - as far as >>> I can
see - changed the discussion. The discussion was about the API. Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected.
In saying that we are insisting on our way, you are saying, implicitly, 'I am not going to negotiate'.
That is only your interpretation. The observation that Mark compromised quite a bit while you didn't seems largely correct to me.
The problem here stems from our inability to work towards agreement, rather than standing on set positions. I set out what changes I think would make the current implementation OK. Can we please, please have a discussion about those points instead of trying to argue about who has given more ground.
That commitment would of course be good. However, even if that were possible before writing code and everyone agreed that the ideas of you and Nathaniel should be implemented in full, it's still not clear that either of you would be willing to write any code. Agreement without code still doesn't help us very much.
I'm going to return to Nathaniel's point - it is a highly valuable thing to set ourselves the target of resolving substantial discussions by consensus. The route you are endorsing here is 'implementor wins'. We don't need to do it that way. We're a mature sensible bunch of adults who can talk out the issues until we agree they are ready for implementation, and then implement. That's all Nathaniel is saying. I think he's obviously right, and I'm sad that it isn't as clear to y'all as it is to me.
Best,
Matthew
Everyone, can we please not do this?! I had enough of adults doing finger pointing back over the summer during the whole debt ceiling debate. I think we can all agree that we are better than the US congress?
Forget about rudeness or decision processes.
I will start by saying that I am willing to separate ignore and absent, but only on the write side of things. On read, I want a single way to identify the missing values. I also want only a single way to perform calculations (either skip or propagate).
An indicator of success would be that people stop using NaNs and magic numbers (-9999, anyone?) and we could even deprecate nansum(), or at least strongly suggest in its docs to use NA.
Well, I haven't completely made up my mind yet, will have to do some more prototyping and playing (and potentially have some of my users eat the differently-flavored dogfood), but I'm really not very satisfied with the API at the moment. I'm mainly worried about the abstraction leaking through to pandas users (this is a pretty large group of people judging by # of downloads). The basic position I'm in is that I'm trying to push Python into a new space, namely mainstream data analysis and statistical computing, one that is solidly occupied by R and other such well-known players. My target users are not computer scientists. They are not going to invest in understanding dtypes very deeply or the internals of ndarray. In fact I've spent a great deal of effort making it so that pandas users can be productive and successful while having very little understanding of NumPy. Yes, I essentially "protect" my users from NumPy because using it well requires a certain level of sophistication that I think is unfair to demand of people. This might seem totally bizarre to some of you but it is simply the state of affairs. So far I have been successful because more people are using Python and pandas to do things that they used to do in R. The NA concept in R is dead simple and I don't see why we are incapable of also implementing something that is just as dead simple. To we, the scipy elite let's call us, it seems simple: "oh, just pass an extra flag to all my array constructors!" But this along with the masked array concept is going to have two likely outcomes: 1) Create a great deal more complication in my already very large codebase and/or 2) force pandas users to understand the new masked arrays after I've carefully made it so they can be largely ignorant of NumPy The mostly-NaN-based solution I've cobbled together and tweaked over the last 42 months actually *works really well*, amazingly, with relatively little cost in code complexity. Having found a reasonably stable equilibrium I'm extremely resistant to upset the balance. So I don't know. After watching these threads bounce back and forth I'm frankly not all that hopeful about a solution arising that actually addresses my needs. best, Wes
Cheers! Ben Root
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Fri, Oct 28, 2011 at 6:45 PM, Wes McKinney <wesmckinn@gmail.com> wrote:
On Fri, Oct 28, 2011 at 7:53 PM, Benjamin Root <ben.root@ou.edu> wrote:
On Friday, October 28, 2011, Matthew Brett <matthew.brett@gmail.com>
Hi,
On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett <
matthew.brett@gmail.com>
wrote:
Hi,
On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett <matthew.brett@gmail.com> wrote: > > Hi, > > On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett > <matthew.brett@gmail.com> > wrote: > > Hi, > > > > On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris > > <charlesr.harris@gmail.com> wrote: > >> > >> > >> On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith <njs@pobox.com
> >> wrote: > >>> > >>> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant > >>> <oliphant@enthought.com> > >>> wrote: > >>> > I think Nathaniel and Matthew provided very > >>> > specific feedback that was helpful in understanding other > >>> > perspectives > >>> > of a > >>> > difficult problem. In particular, I really wanted > >>> > bit-patterns > >>> > implemented. However, I also understand that Mark did quite > >>> > a > >>> > bit > >>> > of > >>> > work > >>> > and altered his original designs quite a bit in response to > >>> > community > >>> > feedback. I wasn't a major part of the pull request > >>> > discussion, > >>> > nor > >>> > did I > >>> > merge the changes, but I support Charles if he reviewed the > >>> > code > >>> > and > >>> > felt > >>> > like it was the right thing to do. I likely would have done > >>> > the > >>> > same > >>> > thing > >>> > rather than let Mark Wiebe's work languish. > >>> > >>> My connectivity is spotty this week, so I'll stay out of the > >>> technical > >>> discussion for now, but I want to share a story. > >>> > >>> Maybe a year ago now, Jonathan Taylor and I were debating what > >>> the > >>> best API for describing statistical models would be -- whether we > >>> wanted something like R's "formulas" (which I supported), or > >>> another > >>> approach based on sympy (his idea). To summarize, I thought his > >>> API > >>> was confusing, pointlessly complicated, and didn't actually solve > >>> the > >>> problem; he thought R-style formulas were superficially simpler > >>> but > >>> hopelessly confused and inconsistent underneath. Now, obviously, > >>> I > >>> was > >>> right and he was wrong. Well, obvious to me, anyway... ;-) But it > >>> wasn't like I could just wave a wand and make his arguments go > >>> away, > >>> no I should point out that the implementation hasn't - as far as > >>> I can see - changed the discussion. The discussion was about the API. Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected.
In saying that we are insisting on our way, you are saying, implicitly, 'I am not going to negotiate'.
That is only your interpretation. The observation that Mark compromised quite a bit while you didn't seems largely correct to me.
The problem here stems from our inability to work towards agreement, rather than standing on set positions. I set out what changes I think would make the current implementation OK. Can we please, please have a discussion about those points instead of trying to argue about who has given more ground.
That commitment would of course be good. However, even if that were possible before writing code and everyone agreed that the ideas of you and Nathaniel should be implemented in full, it's still not clear that either of you would be willing to write any code. Agreement without code still doesn't help us very much.
I'm going to return to Nathaniel's point - it is a highly valuable thing to set ourselves the target of resolving substantial discussions by consensus. The route you are endorsing here is 'implementor wins'. We don't need to do it that way. We're a mature sensible bunch of adults who can talk out the issues until we agree they are ready for implementation, and then implement. That's all Nathaniel is saying. I think he's obviously right, and I'm sad that it isn't as clear to y'all as it is to me.
Best,
Matthew
Everyone, can we please not do this?! I had enough of adults doing finger pointing back over the summer during the whole debt ceiling debate. I
wrote: think
we can all agree that we are better than the US congress?
Forget about rudeness or decision processes.
I will start by saying that I am willing to separate ignore and absent, but only on the write side of things. On read, I want a single way to identify the missing values. I also want only a single way to perform calculations (either skip or propagate).
An indicator of success would be that people stop using NaNs and magic numbers (-9999, anyone?) and we could even deprecate nansum(), or at least strongly suggest in its docs to use NA.
Well, I haven't completely made up my mind yet, will have to do some more prototyping and playing (and potentially have some of my users eat the differently-flavored dogfood), but I'm really not very satisfied with the API at the moment. I'm mainly worried about the abstraction leaking through to pandas users (this is a pretty large group of people judging by # of downloads).
The basic position I'm in is that I'm trying to push Python into a new space, namely mainstream data analysis and statistical computing, one that is solidly occupied by R and other such well-known players. My target users are not computer scientists. They are not going to invest in understanding dtypes very deeply or the internals of ndarray. In fact I've spent a great deal of effort making it so that pandas users can be productive and successful while having very little understanding of NumPy. Yes, I essentially "protect" my users from NumPy because using it well requires a certain level of sophistication that I think is unfair to demand of people. This might seem totally bizarre to some of you but it is simply the state of affairs. So far I have been successful because more people are using Python and pandas to do things that they used to do in R. The NA concept in R is dead simple and I don't see why we are incapable of also implementing something that is just as dead simple. To we, the scipy elite let's call us, it seems simple: "oh, just pass an extra flag to all my array constructors!" But this along with the masked array concept is going to have two likely outcomes:
1) Create a great deal more complication in my already very large codebase
and/or
2) force pandas users to understand the new masked arrays after I've carefully made it so they can be largely ignorant of NumPy
The mostly-NaN-based solution I've cobbled together and tweaked over the last 42 months actually *works really well*, amazingly, with relatively little cost in code complexity. Having found a reasonably stable equilibrium I'm extremely resistant to upset the balance.
So I don't know. After watching these threads bounce back and forth I'm frankly not all that hopeful about a solution arising that actually addresses my needs.
But Wes, what *are* your needs? You keep saying this, but we need examples of how you want to operate and how numpy fails. As to dtypes, internals, and all that, I don't see any of that in the current implementation, unless you mean the maskna and skipna keywords. I believe someone on the previous thread mentioned a way to deal with that. Chuck
On Sat, Oct 29, 2011 at 3:32 AM, Charles R Harris <charlesr.harris@gmail.com
wrote:
On Fri, Oct 28, 2011 at 6:45 PM, Wes McKinney <wesmckinn@gmail.com> wrote:
On Fri, Oct 28, 2011 at 7:53 PM, Benjamin Root <ben.root@ou.edu> wrote:
On Friday, October 28, 2011, Matthew Brett <matthew.brett@gmail.com>
Hi,
On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett <
matthew.brett@gmail.com>
wrote:
Hi,
On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris <charlesr.harris@gmail.com> wrote: > > > On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett > <matthew.brett@gmail.com> > wrote: >> >> Hi, >> >> On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett >> <matthew.brett@gmail.com> >> wrote: >> > Hi, >> > >> > On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris >> > <charlesr.harris@gmail.com> wrote: >> >> >> >> >> >> On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith <
njs@pobox.com>
>> >> wrote: >> >>> >> >>> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant >> >>> <oliphant@enthought.com> >> >>> wrote: >> >>> > I think Nathaniel and Matthew provided very >> >>> > specific feedback that was helpful in understanding other >> >>> > perspectives >> >>> > of a >> >>> > difficult problem. In particular, I really wanted >> >>> > bit-patterns >> >>> > implemented. However, I also understand that Mark did quite >> >>> > a >> >>> > bit >> >>> > of >> >>> > work >> >>> > and altered his original designs quite a bit in response to >> >>> > community >> >>> > feedback. I wasn't a major part of the pull request >> >>> > discussion, >> >>> > nor >> >>> > did I >> >>> > merge the changes, but I support Charles if he reviewed the >> >>> > code >> >>> > and >> >>> > felt >> >>> > like it was the right thing to do. I likely would have done >> >>> > the >> >>> > same >> >>> > thing >> >>> > rather than let Mark Wiebe's work languish. >> >>> >> >>> My connectivity is spotty this week, so I'll stay out of the >> >>> technical >> >>> discussion for now, but I want to share a story. >> >>> >> >>> Maybe a year ago now, Jonathan Taylor and I were debating what >> >>> the >> >>> best API for describing statistical models would be -- whether we >> >>> wanted something like R's "formulas" (which I supported), or >> >>> another >> >>> approach based on sympy (his idea). To summarize, I thought his >> >>> API >> >>> was confusing, pointlessly complicated, and didn't actually solve >> >>> the >> >>> problem; he thought R-style formulas were superficially simpler >> >>> but >> >>> hopelessly confused and inconsistent underneath. Now, obviously, >> >>> I >> >>> was >> >>> right and he was wrong. Well, obvious to me, anyway... ;-) But it >> >>> wasn't like I could just wave a wand and make his arguments go >> >>> away, >> >>> no I should point out that the implementation hasn't - as far as >> >>> I can see - changed the discussion. The discussion was about the API. Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected.
In saying that we are insisting on our way, you are saying, implicitly, 'I am not going to negotiate'.
That is only your interpretation. The observation that Mark compromised quite a bit while you didn't seems largely correct to me.
The problem here stems from our inability to work towards agreement, rather than standing on set positions. I set out what changes I think would make the current implementation OK. Can we please, please have a discussion about those points instead of trying to argue about who has given more ground.
That commitment would of course be good. However, even if that were possible before writing code and everyone agreed that the ideas of you and Nathaniel should be implemented in full, it's still not clear that either of you would be willing to write any code. Agreement without code still doesn't help us very much.
I'm going to return to Nathaniel's point - it is a highly valuable thing to set ourselves the target of resolving substantial discussions by consensus. The route you are endorsing here is 'implementor wins'. We don't need to do it that way. We're a mature sensible bunch of adults who can talk out the issues until we agree they are ready for implementation, and then implement. That's all Nathaniel is saying. I think he's obviously right, and I'm sad that it isn't as clear to y'all as it is to me.
Best,
Matthew
Everyone, can we please not do this?! I had enough of adults doing finger pointing back over the summer during the whole debt ceiling debate. I
wrote: think
we can all agree that we are better than the US congress?
Forget about rudeness or decision processes.
I will start by saying that I am willing to separate ignore and absent, but only on the write side of things. On read, I want a single way to identify the missing values. I also want only a single way to perform calculations (either skip or propagate).
An indicator of success would be that people stop using NaNs and magic numbers (-9999, anyone?) and we could even deprecate nansum(), or at least strongly suggest in its docs to use NA.
Well, I haven't completely made up my mind yet, will have to do some more prototyping and playing (and potentially have some of my users eat the differently-flavored dogfood), but I'm really not very satisfied with the API at the moment. I'm mainly worried about the abstraction leaking through to pandas users (this is a pretty large group of people judging by # of downloads).
The basic position I'm in is that I'm trying to push Python into a new space, namely mainstream data analysis and statistical computing, one that is solidly occupied by R and other such well-known players. My target users are not computer scientists. They are not going to invest in understanding dtypes very deeply or the internals of ndarray. In fact I've spent a great deal of effort making it so that pandas users can be productive and successful while having very little understanding of NumPy. Yes, I essentially "protect" my users from NumPy because using it well requires a certain level of sophistication that I think is unfair to demand of people. This might seem totally bizarre to some of you but it is simply the state of affairs. So far I have been successful because more people are using Python and pandas to do things that they used to do in R. The NA concept in R is dead simple and I don't see why we are incapable of also implementing something that is just as dead simple. To we, the scipy elite let's call us, it seems simple: "oh, just pass an extra flag to all my array constructors!" But this along with the masked array concept is going to have two likely outcomes:
1) Create a great deal more complication in my already very large codebase
and/or
2) force pandas users to understand the new masked arrays after I've carefully made it so they can be largely ignorant of NumPy
The mostly-NaN-based solution I've cobbled together and tweaked over the last 42 months actually *works really well*, amazingly, with relatively little cost in code complexity. Having found a reasonably stable equilibrium I'm extremely resistant to upset the balance.
So I don't know. After watching these threads bounce back and forth I'm frankly not all that hopeful about a solution arising that actually addresses my needs.
But Wes, what *are* your needs? You keep saying this, but we need examples of how you want to operate and how numpy fails. As to dtypes, internals, and all that, I don't see any of that in the current implementation, unless you mean the maskna and skipna keywords. I believe someone on the previous thread mentioned a way to deal with that.
From the release notes I just learned that skipna is basically the same as in R: "R's parameter rm.na=T is spelled skipna=True in NumPy."
It provides a good summary of the current status in master: https://github.com/numpy/numpy/blob/master/doc/release/2.0.0-notes.rst Ralf
On Fri, Oct 28, 2011 at 9:32 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Fri, Oct 28, 2011 at 6:45 PM, Wes McKinney <wesmckinn@gmail.com> wrote:
On Fri, Oct 28, 2011 at 7:53 PM, Benjamin Root <ben.root@ou.edu> wrote:
On Friday, October 28, 2011, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris <charlesr.harris@gmail.com> wrote: > > > On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett > <matthew.brett@gmail.com> > wrote: >> >> Hi, >> >> On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett >> <matthew.brett@gmail.com> >> wrote: >> > Hi, >> > >> > On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris >> > <charlesr.harris@gmail.com> wrote: >> >> >> >> >> >> On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith >> >> <njs@pobox.com> >> >> wrote: >> >>> >> >>> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant >> >>> <oliphant@enthought.com> >> >>> wrote: >> >>> > I think Nathaniel and Matthew provided very >> >>> > specific feedback that was helpful in understanding other >> >>> > perspectives >> >>> > of a >> >>> > difficult problem. In particular, I really wanted >> >>> > bit-patterns >> >>> > implemented. However, I also understand that Mark did >> >>> > quite >> >>> > a >> >>> > bit >> >>> > of >> >>> > work >> >>> > and altered his original designs quite a bit in response to >> >>> > community >> >>> > feedback. I wasn't a major part of the pull request >> >>> > discussion, >> >>> > nor >> >>> > did I >> >>> > merge the changes, but I support Charles if he reviewed the >> >>> > code >> >>> > and >> >>> > felt >> >>> > like it was the right thing to do. I likely would have done >> >>> > the >> >>> > same >> >>> > thing >> >>> > rather than let Mark Wiebe's work languish. >> >>> >> >>> My connectivity is spotty this week, so I'll stay out of the >> >>> technical >> >>> discussion for now, but I want to share a story. >> >>> >> >>> Maybe a year ago now, Jonathan Taylor and I were debating what >> >>> the >> >>> best API for describing statistical models would be -- whether >> >>> we >> >>> wanted something like R's "formulas" (which I supported), or >> >>> another >> >>> approach based on sympy (his idea). To summarize, I thought >> >>> his >> >>> API >> >>> was confusing, pointlessly complicated, and didn't actually >> >>> solve >> >>> the >> >>> problem; he thought R-style formulas were superficially >> >>> simpler >> >>> but >> >>> hopelessly confused and inconsistent underneath. Now, >> >>> obviously, >> >>> I >> >>> was >> >>> right and he was wrong. Well, obvious to me, anyway... ;-) But >> >>> it >> >>> wasn't like I could just wave a wand and make his arguments go >> >>> away, >> >>> no I should point out that the implementation hasn't - as far >> >>> as >> >>> I can
see - changed the discussion. The discussion was about the API. Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected.
In saying that we are insisting on our way, you are saying, implicitly, 'I am not going to negotiate'.
That is only your interpretation. The observation that Mark compromised quite a bit while you didn't seems largely correct to me.
The problem here stems from our inability to work towards agreement, rather than standing on set positions. I set out what changes I think would make the current implementation OK. Can we please, please have a discussion about those points instead of trying to argue about who has given more ground.
That commitment would of course be good. However, even if that were possible before writing code and everyone agreed that the ideas of you and Nathaniel should be implemented in full, it's still not clear that either of you would be willing to write any code. Agreement without code still doesn't help us very much.
I'm going to return to Nathaniel's point - it is a highly valuable thing to set ourselves the target of resolving substantial discussions by consensus. The route you are endorsing here is 'implementor wins'. We don't need to do it that way. We're a mature sensible bunch of adults who can talk out the issues until we agree they are ready for implementation, and then implement. That's all Nathaniel is saying. I think he's obviously right, and I'm sad that it isn't as clear to y'all as it is to me.
Best,
Matthew
Everyone, can we please not do this?! I had enough of adults doing finger pointing back over the summer during the whole debt ceiling debate. I think we can all agree that we are better than the US congress?
Forget about rudeness or decision processes.
I will start by saying that I am willing to separate ignore and absent, but only on the write side of things. On read, I want a single way to identify the missing values. I also want only a single way to perform calculations (either skip or propagate).
An indicator of success would be that people stop using NaNs and magic numbers (-9999, anyone?) and we could even deprecate nansum(), or at least strongly suggest in its docs to use NA.
Well, I haven't completely made up my mind yet, will have to do some more prototyping and playing (and potentially have some of my users eat the differently-flavored dogfood), but I'm really not very satisfied with the API at the moment. I'm mainly worried about the abstraction leaking through to pandas users (this is a pretty large group of people judging by # of downloads).
The basic position I'm in is that I'm trying to push Python into a new space, namely mainstream data analysis and statistical computing, one that is solidly occupied by R and other such well-known players. My target users are not computer scientists. They are not going to invest in understanding dtypes very deeply or the internals of ndarray. In fact I've spent a great deal of effort making it so that pandas users can be productive and successful while having very little understanding of NumPy. Yes, I essentially "protect" my users from NumPy because using it well requires a certain level of sophistication that I think is unfair to demand of people. This might seem totally bizarre to some of you but it is simply the state of affairs. So far I have been successful because more people are using Python and pandas to do things that they used to do in R. The NA concept in R is dead simple and I don't see why we are incapable of also implementing something that is just as dead simple. To we, the scipy elite let's call us, it seems simple: "oh, just pass an extra flag to all my array constructors!" But this along with the masked array concept is going to have two likely outcomes:
1) Create a great deal more complication in my already very large codebase
and/or
2) force pandas users to understand the new masked arrays after I've carefully made it so they can be largely ignorant of NumPy
The mostly-NaN-based solution I've cobbled together and tweaked over the last 42 months actually *works really well*, amazingly, with relatively little cost in code complexity. Having found a reasonably stable equilibrium I'm extremely resistant to upset the balance.
So I don't know. After watching these threads bounce back and forth I'm frankly not all that hopeful about a solution arising that actually addresses my needs.
But Wes, what *are* your needs? You keep saying this, but we need examples of how you want to operate and how numpy fails. As to dtypes, internals, and all that, I don't see any of that in the current implementation, unless you mean the maskna and skipna keywords. I believe someone on the previous thread mentioned a way to deal with that.
Chuck
Here are my needs: 1) How NAs are implemented cannot be end user visible. Having to pass maskna=True is a problem. I suppose a solution is to set the flag to true on every array inside of pandas so the user never knows (you mentioned someone else had some other solution, i could go back and dig it up?) 2) Performance: I can't accept more than say 2x overhead in floating point array operations (binary ops or reductions). Last time I checked we were a long way away from that 3) Implementation of NA-aware algorithms in Cython. A lot of pandas is about moving data around. Bit patterns would make life a lot easier because the code wouldn't have to change (much). But with masked arrays I'll have to move both data and mask values. Not the end of the world but is just the price you pay, I guess. Things in R are a bit simpler re: bit patterns because there's only double, integer, string (character), and boolean dtypes, whereas NumPy has the whole C type hierarchy. So I can appreciate that doing bit patterns across all the dtypes would be really hard. In any case, I recognize that the current implementation will be useful to a lot of people, but it may not meet my performance and usability requirements. As I said, the solution I've cooked up has worked well so far, and since it isn't a major pain point I may just adopt the "ain't broke, don't fix" attitude and focus my efforts on building new features. "Practicality beats purity", I suppose - W
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Sat, Oct 29, 2011 at 12:14 PM, Wes McKinney <wesmckinn@gmail.com> wrote:
On Fri, Oct 28, 2011 at 9:32 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Fri, Oct 28, 2011 at 6:45 PM, Wes McKinney <wesmckinn@gmail.com>
On Fri, Oct 28, 2011 at 7:53 PM, Benjamin Root <ben.root@ou.edu> wrote:
On Friday, October 28, 2011, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett <matthew.brett@gmail.com> wrote: > > Hi, > > On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris > <charlesr.harris@gmail.com> wrote: > > > > > > On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett > > <matthew.brett@gmail.com> > > wrote: > >> > >> Hi, > >> > >> On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett > >> <matthew.brett@gmail.com> > >> wrote: > >> > Hi, > >> > > >> > On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris > >> > <charlesr.harris@gmail.com> wrote: > >> >> > >> >> > >> >> On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith > >> >> <njs@pobox.com> > >> >> wrote: > >> >>> > >> >>> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant > >> >>> <oliphant@enthought.com> > >> >>> wrote: > >> >>> > I think Nathaniel and Matthew provided very > >> >>> > specific feedback that was helpful in understanding other > >> >>> > perspectives > >> >>> > of a > >> >>> > difficult problem. In particular, I really wanted > >> >>> > bit-patterns > >> >>> > implemented. However, I also understand that Mark did > >> >>> > quite > >> >>> > a > >> >>> > bit > >> >>> > of > >> >>> > work > >> >>> > and altered his original designs quite a bit in response
to
> >> >>> > community > >> >>> > feedback. I wasn't a major part of the pull request > >> >>> > discussion, > >> >>> > nor > >> >>> > did I > >> >>> > merge the changes, but I support Charles if he reviewed
> >> >>> > code > >> >>> > and > >> >>> > felt > >> >>> > like it was the right thing to do. I likely would have done > >> >>> > the > >> >>> > same > >> >>> > thing > >> >>> > rather than let Mark Wiebe's work languish. > >> >>> > >> >>> My connectivity is spotty this week, so I'll stay out of the > >> >>> technical > >> >>> discussion for now, but I want to share a story. > >> >>> > >> >>> Maybe a year ago now, Jonathan Taylor and I were debating what > >> >>> the > >> >>> best API for describing statistical models would be -- whether > >> >>> we > >> >>> wanted something like R's "formulas" (which I supported), or > >> >>> another > >> >>> approach based on sympy (his idea). To summarize, I thought > >> >>> his > >> >>> API > >> >>> was confusing, pointlessly complicated, and didn't actually > >> >>> solve > >> >>> the > >> >>> problem; he thought R-style formulas were superficially > >> >>> simpler > >> >>> but > >> >>> hopelessly confused and inconsistent underneath. Now, > >> >>> obviously, > >> >>> I > >> >>> was > >> >>> right and he was wrong. Well, obvious to me, anyway... ;-) But > >> >>> it > >> >>> wasn't like I could just wave a wand and make his arguments go > >> >>> away, > >> >>> no I should point out that the implementation hasn't - as far > >> >>> as > >> >>> I can see - changed the discussion. The discussion was about the API. Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement
at least as far as I can see. Again, I'm happy to be corrected.
> In saying that we are insisting on our way, you are saying, > implicitly, > 'I > am not going to negotiate'.
That is only your interpretation. The observation that Mark compromised quite a bit while you didn't seems largely correct to me.
The problem here stems from our inability to work towards agreement, rather than standing on set positions. I set out what changes I
wrote: the - think
would make the current implementation OK. Can we please, please have a discussion about those points instead of trying to argue about who has given more ground.
That commitment would of course be good. However, even if that were possible before writing code and everyone agreed that the ideas of you and Nathaniel should be implemented in full, it's still not clear that either of you would be willing to write any code. Agreement without code still doesn't help us very much.
I'm going to return to Nathaniel's point - it is a highly valuable thing to set ourselves the target of resolving substantial discussions by consensus. The route you are endorsing here is 'implementor wins'. We don't need to do it that way. We're a mature sensible bunch of adults who can talk out the issues until we agree they are ready for implementation, and then implement. That's all Nathaniel is saying. I think he's obviously right, and I'm sad that it isn't as clear to y'all as it is to me.
Best,
Matthew
Everyone, can we please not do this?! I had enough of adults doing finger pointing back over the summer during the whole debt ceiling debate. I think we can all agree that we are better than the US congress?
Forget about rudeness or decision processes.
I will start by saying that I am willing to separate ignore and absent, but only on the write side of things. On read, I want a single way to identify the missing values. I also want only a single way to perform calculations (either skip or propagate).
An indicator of success would be that people stop using NaNs and magic numbers (-9999, anyone?) and we could even deprecate nansum(), or at least strongly suggest in its docs to use NA.
Well, I haven't completely made up my mind yet, will have to do some more prototyping and playing (and potentially have some of my users eat the differently-flavored dogfood), but I'm really not very satisfied with the API at the moment. I'm mainly worried about the abstraction leaking through to pandas users (this is a pretty large group of people judging by # of downloads).
The basic position I'm in is that I'm trying to push Python into a new space, namely mainstream data analysis and statistical computing, one that is solidly occupied by R and other such well-known players. My target users are not computer scientists. They are not going to invest in understanding dtypes very deeply or the internals of ndarray. In fact I've spent a great deal of effort making it so that pandas users can be productive and successful while having very little understanding of NumPy. Yes, I essentially "protect" my users from NumPy because using it well requires a certain level of sophistication that I think is unfair to demand of people. This might seem totally bizarre to some of you but it is simply the state of affairs. So far I have been successful because more people are using Python and pandas to do things that they used to do in R. The NA concept in R is dead simple and I don't see why we are incapable of also implementing something that is just as dead simple. To we, the scipy elite let's call us, it seems simple: "oh, just pass an extra flag to all my array constructors!" But this along with the masked array concept is going to have two likely outcomes:
1) Create a great deal more complication in my already very large codebase
and/or
2) force pandas users to understand the new masked arrays after I've carefully made it so they can be largely ignorant of NumPy
The mostly-NaN-based solution I've cobbled together and tweaked over the last 42 months actually *works really well*, amazingly, with relatively little cost in code complexity. Having found a reasonably stable equilibrium I'm extremely resistant to upset the balance.
So I don't know. After watching these threads bounce back and forth I'm frankly not all that hopeful about a solution arising that actually addresses my needs.
But Wes, what *are* your needs? You keep saying this, but we need examples of how you want to operate and how numpy fails. As to dtypes, internals, and all that, I don't see any of that in the current implementation, unless you mean the maskna and skipna keywords. I believe someone on the previous thread mentioned a way to deal with that.
Chuck
Here are my needs:
1) How NAs are implemented cannot be end user visible. Having to pass maskna=True is a problem. I suppose a solution is to set the flag to true on every array inside of pandas so the user never knows (you mentioned someone else had some other solution, i could go back and dig it up?)
I believe it was Eric Firing who mentioned that he raised this question during development and Mark offered a potential solution. What ever that solution was, we should take a look at implementing it.
2) Performance: I can't accept more than say 2x overhead in floating point array operations (binary ops or reductions). Last time I checked we were a long way away from that
Known problem, and probably fixable by pushing things down into the inner ufunc loops. What we have at the moment is a prototype for testing the API and that is what we need feedback on.
3) Implementation of NA-aware algorithms in Cython. A lot of pandas is about moving data around. Bit patterns would make life a lot easier because the code wouldn't have to change (much). But with masked arrays I'll have to move both data and mask values. Not the end of the world but is just the price you pay, I guess.
Agree that this is a problem, along with memory usage. One solution is to have a way to translate to bit patterns for export/import. Note that in the wild some data sets come with separate masks, sometimes several for different conditions, so the current implementation would work better for those. We need to support several options here.
Things in R are a bit simpler re: bit patterns because there's only double, integer, string (character), and boolean dtypes, whereas NumPy has the whole C type hierarchy. So I can appreciate that doing bit patterns across all the dtypes would be really hard.
We could maybe limit it to float types, strings, and booleans, maybe dates also. I think integers are problematical, for instance a uint8 255 turns up in 8 bit images and means saturated, not missing.
In any case, I recognize that the current implementation will be useful to a lot of people, but it may not meet my performance and usability requirements. As I said, the solution I've cooked up has worked well so far, and since it isn't a major pain point I may just adopt the "ain't broke, don't fix" attitude and focus my efforts on building new features. "Practicality beats purity", I suppose
That's perfectly reasonable. It would still help if you gave examples of use cases where the current API doesn't work for you. I don't see much difference between code using nan's and code using NA at the API level apart from the maskna/skipna keywords. Chuck
Hi, On Sat, Oct 29, 2011 at 11:14 AM, Wes McKinney <wesmckinn@gmail.com> wrote:
On Fri, Oct 28, 2011 at 9:32 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Fri, Oct 28, 2011 at 6:45 PM, Wes McKinney <wesmckinn@gmail.com> wrote:
On Fri, Oct 28, 2011 at 7:53 PM, Benjamin Root <ben.root@ou.edu> wrote:
On Friday, October 28, 2011, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett <matthew.brett@gmail.com> wrote: > > Hi, > > On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris > <charlesr.harris@gmail.com> wrote: > > > > > > On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett > > <matthew.brett@gmail.com> > > wrote: > >> > >> Hi, > >> > >> On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett > >> <matthew.brett@gmail.com> > >> wrote: > >> > Hi, > >> > > >> > On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris > >> > <charlesr.harris@gmail.com> wrote: > >> >> > >> >> > >> >> On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith > >> >> <njs@pobox.com> > >> >> wrote: > >> >>> > >> >>> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant > >> >>> <oliphant@enthought.com> > >> >>> wrote: > >> >>> > I think Nathaniel and Matthew provided very > >> >>> > specific feedback that was helpful in understanding other > >> >>> > perspectives > >> >>> > of a > >> >>> > difficult problem. In particular, I really wanted > >> >>> > bit-patterns > >> >>> > implemented. However, I also understand that Mark did > >> >>> > quite > >> >>> > a > >> >>> > bit > >> >>> > of > >> >>> > work > >> >>> > and altered his original designs quite a bit in response to > >> >>> > community > >> >>> > feedback. I wasn't a major part of the pull request > >> >>> > discussion, > >> >>> > nor > >> >>> > did I > >> >>> > merge the changes, but I support Charles if he reviewed the > >> >>> > code > >> >>> > and > >> >>> > felt > >> >>> > like it was the right thing to do. I likely would have done > >> >>> > the > >> >>> > same > >> >>> > thing > >> >>> > rather than let Mark Wiebe's work languish. > >> >>> > >> >>> My connectivity is spotty this week, so I'll stay out of the > >> >>> technical > >> >>> discussion for now, but I want to share a story. > >> >>> > >> >>> Maybe a year ago now, Jonathan Taylor and I were debating what > >> >>> the > >> >>> best API for describing statistical models would be -- whether > >> >>> we > >> >>> wanted something like R's "formulas" (which I supported), or > >> >>> another > >> >>> approach based on sympy (his idea). To summarize, I thought > >> >>> his > >> >>> API > >> >>> was confusing, pointlessly complicated, and didn't actually > >> >>> solve > >> >>> the > >> >>> problem; he thought R-style formulas were superficially > >> >>> simpler > >> >>> but > >> >>> hopelessly confused and inconsistent underneath. Now, > >> >>> obviously, > >> >>> I > >> >>> was > >> >>> right and he was wrong. Well, obvious to me, anyway... ;-) But > >> >>> it > >> >>> wasn't like I could just wave a wand and make his arguments go > >> >>> away, > >> >>> no I should point out that the implementation hasn't - as far > >> >>> as > >> >>> I can
see - changed the discussion. The discussion was about the API. Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected.
> In saying that we are insisting on our way, you are saying, > implicitly, > 'I > am not going to negotiate'.
That is only your interpretation. The observation that Mark compromised quite a bit while you didn't seems largely correct to me.
The problem here stems from our inability to work towards agreement, rather than standing on set positions. I set out what changes I think would make the current implementation OK. Can we please, please have a discussion about those points instead of trying to argue about who has given more ground.
That commitment would of course be good. However, even if that were possible before writing code and everyone agreed that the ideas of you and Nathaniel should be implemented in full, it's still not clear that either of you would be willing to write any code. Agreement without code still doesn't help us very much.
I'm going to return to Nathaniel's point - it is a highly valuable thing to set ourselves the target of resolving substantial discussions by consensus. The route you are endorsing here is 'implementor wins'. We don't need to do it that way. We're a mature sensible bunch of adults who can talk out the issues until we agree they are ready for implementation, and then implement. That's all Nathaniel is saying. I think he's obviously right, and I'm sad that it isn't as clear to y'all as it is to me.
Best,
Matthew
Everyone, can we please not do this?! I had enough of adults doing finger pointing back over the summer during the whole debt ceiling debate. I think we can all agree that we are better than the US congress?
Forget about rudeness or decision processes.
I will start by saying that I am willing to separate ignore and absent, but only on the write side of things. On read, I want a single way to identify the missing values. I also want only a single way to perform calculations (either skip or propagate).
An indicator of success would be that people stop using NaNs and magic numbers (-9999, anyone?) and we could even deprecate nansum(), or at least strongly suggest in its docs to use NA.
Well, I haven't completely made up my mind yet, will have to do some more prototyping and playing (and potentially have some of my users eat the differently-flavored dogfood), but I'm really not very satisfied with the API at the moment. I'm mainly worried about the abstraction leaking through to pandas users (this is a pretty large group of people judging by # of downloads).
The basic position I'm in is that I'm trying to push Python into a new space, namely mainstream data analysis and statistical computing, one that is solidly occupied by R and other such well-known players. My target users are not computer scientists. They are not going to invest in understanding dtypes very deeply or the internals of ndarray. In fact I've spent a great deal of effort making it so that pandas users can be productive and successful while having very little understanding of NumPy. Yes, I essentially "protect" my users from NumPy because using it well requires a certain level of sophistication that I think is unfair to demand of people. This might seem totally bizarre to some of you but it is simply the state of affairs. So far I have been successful because more people are using Python and pandas to do things that they used to do in R. The NA concept in R is dead simple and I don't see why we are incapable of also implementing something that is just as dead simple. To we, the scipy elite let's call us, it seems simple: "oh, just pass an extra flag to all my array constructors!" But this along with the masked array concept is going to have two likely outcomes:
1) Create a great deal more complication in my already very large codebase
and/or
2) force pandas users to understand the new masked arrays after I've carefully made it so they can be largely ignorant of NumPy
The mostly-NaN-based solution I've cobbled together and tweaked over the last 42 months actually *works really well*, amazingly, with relatively little cost in code complexity. Having found a reasonably stable equilibrium I'm extremely resistant to upset the balance.
So I don't know. After watching these threads bounce back and forth I'm frankly not all that hopeful about a solution arising that actually addresses my needs.
But Wes, what *are* your needs? You keep saying this, but we need examples of how you want to operate and how numpy fails. As to dtypes, internals, and all that, I don't see any of that in the current implementation, unless you mean the maskna and skipna keywords. I believe someone on the previous thread mentioned a way to deal with that.
Chuck
Here are my needs:
1) How NAs are implemented cannot be end user visible. Having to pass maskna=True is a problem. I suppose a solution is to set the flag to true on every array inside of pandas so the user never knows (you mentioned someone else had some other solution, i could go back and dig it up?)
I guess this would be the same with bitpatterns, in that the user would have to specify a custom dtype. Is it possible to add a bitpattern NA (in the NaN values) to the current floating point types, at least in principle? So that np.float etc would have bitpattern NAs without a custom dtype? See you, Matthew
Here are my needs:
1) How NAs are implemented cannot be end user visible. Having to pass maskna=True is a problem. I suppose a solution is to set the flag to true on every array inside of pandas so the user never knows (you mentioned someone else had some other solution, i could go back and dig it up?)
I guess this would be the same with bitpatterns, in that the user would have to specify a custom dtype.
Is it possible to add a bitpattern NA (in the NaN values) to the current floating point types, at least in principle? So that np.float etc would have bitpattern NAs without a custom dtype?
That is an interesting idea. It's essentially what people like Wes McKinney are doing now. However, the issue is going to be whether or not you do something special or not with the NA values in the low-level C function the dtype dispatches to. This is the reason for the special bit-pattern dtype. I've always thought that requiring NA checks for code that doesn't want to worry about it would slow things down un-necessarily for those use-cases. But, not dealing with missing data well is a missing NumPy feature. -Travis
See you,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
--- Travis Oliphant Enthought, Inc. oliphant@enthought.com 1-512-536-1057 http://www.enthought.com
Hi, On Sat, Oct 29, 2011 at 10:02 PM, Travis Oliphant <oliphant@enthought.com> wrote:
Here are my needs:
1) How NAs are implemented cannot be end user visible. Having to pass maskna=True is a problem. I suppose a solution is to set the flag to true on every array inside of pandas so the user never knows (you mentioned someone else had some other solution, i could go back and dig it up?)
I guess this would be the same with bitpatterns, in that the user would have to specify a custom dtype.
Is it possible to add a bitpattern NA (in the NaN values) to the current floating point types, at least in principle? So that np.float etc would have bitpattern NAs without a custom dtype?
That is an interesting idea. It's essentially what people like Wes McKinney are doing now. However, the issue is going to be whether or not you do something special or not with the NA values in the low-level C function the dtype dispatches to. This is the reason for the special bit-pattern dtype.
I've always thought that requiring NA checks for code that doesn't want to worry about it would slow things down un-necessarily for those use-cases.
Right - now that the caffeine has run through my system adequately, I have a few glasses of wine to disrupt my logic and / or social skills but: Is there any way you could imagine something like this?: In [3]: a = np.arange(10, dtype=np.float) In [4]: a.flags Out[4]: C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False MAYBE_NA : False In [5]: a[0] = np.NA In [6]: a.flags Out[6]: C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False MAYBE_NA : True Obviously extension writers would have to keep the flag maintained... Sorry if that doesn't make sense, I do not claim to be in full possession of my faculties, See you, Matthew
Hi, On Fri, Oct 28, 2011 at 4:53 PM, Benjamin Root <ben.root@ou.edu> wrote:
On Friday, October 28, 2011, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett <matthew.brett@gmail.com> wrote: > Hi, > > On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris > <charlesr.harris@gmail.com> wrote: >> >> >> On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith <njs@pobox.com> >> wrote: >>> >>> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant >>> <oliphant@enthought.com> >>> wrote: >>> > I think Nathaniel and Matthew provided very >>> > specific feedback that was helpful in understanding other >>> > perspectives >>> > of a >>> > difficult problem. In particular, I really wanted >>> > bit-patterns >>> > implemented. However, I also understand that Mark did quite >>> > a >>> > bit >>> > of >>> > work >>> > and altered his original designs quite a bit in response to >>> > community >>> > feedback. I wasn't a major part of the pull request >>> > discussion, >>> > nor >>> > did I >>> > merge the changes, but I support Charles if he reviewed the >>> > code >>> > and >>> > felt >>> > like it was the right thing to do. I likely would have done >>> > the >>> > same >>> > thing >>> > rather than let Mark Wiebe's work languish. >>> >>> My connectivity is spotty this week, so I'll stay out of the >>> technical >>> discussion for now, but I want to share a story. >>> >>> Maybe a year ago now, Jonathan Taylor and I were debating what >>> the >>> best API for describing statistical models would be -- whether we >>> wanted something like R's "formulas" (which I supported), or >>> another >>> approach based on sympy (his idea). To summarize, I thought his >>> API >>> was confusing, pointlessly complicated, and didn't actually solve >>> the >>> problem; he thought R-style formulas were superficially simpler >>> but >>> hopelessly confused and inconsistent underneath. Now, obviously, >>> I >>> was >>> right and he was wrong. Well, obvious to me, anyway... ;-) But it >>> wasn't like I could just wave a wand and make his arguments go >>> away, >>> no I should point out that the implementation hasn't - as far as >>> I can
see - changed the discussion. The discussion was about the API. Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected.
In saying that we are insisting on our way, you are saying, implicitly, 'I am not going to negotiate'.
That is only your interpretation. The observation that Mark compromised quite a bit while you didn't seems largely correct to me.
The problem here stems from our inability to work towards agreement, rather than standing on set positions. I set out what changes I think would make the current implementation OK. Can we please, please have a discussion about those points instead of trying to argue about who has given more ground.
That commitment would of course be good. However, even if that were possible before writing code and everyone agreed that the ideas of you and Nathaniel should be implemented in full, it's still not clear that either of you would be willing to write any code. Agreement without code still doesn't help us very much.
I'm going to return to Nathaniel's point - it is a highly valuable thing to set ourselves the target of resolving substantial discussions by consensus. The route you are endorsing here is 'implementor wins'. We don't need to do it that way. We're a mature sensible bunch of adults who can talk out the issues until we agree they are ready for implementation, and then implement. That's all Nathaniel is saying. I think he's obviously right, and I'm sad that it isn't as clear to y'all as it is to me.
Best,
Matthew
Everyone, can we please not do this?! I had enough of adults doing finger pointing back over the summer during the whole debt ceiling debate. I think we can all agree that we are better than the US congress?
Yes, please.
Forget about rudeness or decision processes.
No, that's a common mistake, which is to assume that any conversation about things which aren't technical, is not important. Nathaniel's point is important. Rudeness is important. The reason we've got into this mess is because we clearly don't have an agreed way of making decisions. That's why countries and open-source projects have constitutions, so this doesn't happen.
I will start by saying that I am willing to separate ignore and absent, but only on the write side of things. On read, I want a single way to identify the missing values. I also want only a single way to perform calculations (either skip or propagate).
Thank you - that is very helpful. Are you saying that you'd be OK setting missing values like this?
a.mask[0:2] = False
For the read side, do you mean you're OK with this
a.isna()
To identify the missing values, as is currently the case? Or something else? If so, then I think we're very close, it's just a discussion about names.
An indicator of success would be that people stop using NaNs and magic numbers (-9999, anyone?) and we could even deprecate nansum(), or at least strongly suggest in its docs to use NA.
That is an excellent benchmark, Best, Matthew
Matt, On Friday, October 28, 2011, Matthew Brett <matthew.brett@gmail.com> wrote:
Forget about rudeness or decision processes.
No, that's a common mistake, which is to assume that any conversation about things which aren't technical, is not important. Nathaniel's point is important. Rudeness is important. The reason we've got into this mess is because we clearly don't have an agreed way of making decisions. That's why countries and open-source projects have constitutions, so this doesn't happen.
Don't get me wrong. In general, you are right. And maybe we all should discuss something to that effect for numpy. But I would rather do that when there isn't such contention and tempers. As for allegations of rudeness, I believe that we are actually very close to consensus that I immediately wanted to squelch any sort of meta-meta-disagreements about who was being rude to who. As a quick band-aide, anybody who felt slighted by me gets a drink on me at the next scipy conference. From this point on, let's institute a 10 minute rule -- write your email, wait ten minutes, read it again and edit it.
I will start by saying that I am willing to separate ignore and absent,
but
only on the write side of things. On read, I want a single way to identify the missing values. I also want only a single way to perform calculations (either skip or propagate).
Thank you - that is very helpful.
Are you saying that you'd be OK setting missing values like this?
a.mask[0:2] = False
Probably not that far, because that would be an attribute that may or may not exist. Rather, I might like the idea of a NA to "always" mean absent (and destroys - even through views), and MA (or some other name) which always means ignore (and has the masking behavior with views). This makes specific behaviors tied distinctly to specific objects.
For the read side, do you mean you're OK with this
a.isna()
To identify the missing values, as is currently the case? Or something else?
Yes. A missing value is a missing value, regardless of it being absent or marked as ignored. But it is a bit more subtle than that. I should just be able to add two arrays together and the "data should know what to do". When the core ufuncs get this right (like min, max, sum, cumsum, diff, etc), then I don't have to do much to prepare higher level funcs for missing data.
If so, then I think we're very close, it's just a discussion about names.
And what does ignore + absent equals. ;-)
An indicator of success would be that people stop using NaNs and magic numbers (-9999, anyone?) and we could even deprecate nansum(), or at least strongly suggest in its docs to use NA.
That is an excellent benchmark,
Best,
Matthew
Cheers, Ben Root
On 10/28/11 10:38 PM, Benjamin Root wrote:
I might like the idea of a NA to "always" mean absent (and destroys - even through views), and MA (or some other name) which always means ignore (and has the masking behavior with views).
I should point out that if I'm dictating code to someone (e.g., teaching, or helping someone verbally), it's going to be hard to distinguish between the verbal sounds of "NA" and "MA". And from a lurker (me), thanks for the discussion. I find it very interesting to read. Thanks, Jason Grout
Hi, instead of putting up a pull request that reverts all the 25000 lines of code than have been written to support an NA mask, why won't you set up a pull request that uses the current code base to implement your own ideas on how it should work?
Hi, On Fri, Oct 28, 2011 at 8:38 PM, Benjamin Root <ben.root@ou.edu> wrote:
Matt,
On Friday, October 28, 2011, Matthew Brett <matthew.brett@gmail.com> wrote:
Forget about rudeness or decision processes.
No, that's a common mistake, which is to assume that any conversation about things which aren't technical, is not important. Nathaniel's point is important. Rudeness is important. The reason we've got into this mess is because we clearly don't have an agreed way of making decisions. That's why countries and open-source projects have constitutions, so this doesn't happen.
Don't get me wrong. In general, you are right. And maybe we all should discuss something to that effect for numpy. But I would rather do that when there isn't such contention and tempers.
That's a reasonable point.
As for allegations of rudeness, I believe that we are actually very close to consensus that I immediately wanted to squelch any sort of meta-meta-disagreements about who was being rude to who. As a quick band-aide, anybody who felt slighted by me gets a drink on me at the next scipy conference. From this point on, let's institute a 10 minute rule -- write your email, wait ten minutes, read it again and edit it.
Good offer. I make the same one.
I will start by saying that I am willing to separate ignore and absent, but only on the write side of things. On read, I want a single way to identify the missing values. I also want only a single way to perform calculations (either skip or propagate).
Thank you - that is very helpful.
Are you saying that you'd be OK setting missing values like this?
a.mask[0:2] = False
Probably not that far, because that would be an attribute that may or may not exist. Rather, I might like the idea of a NA to "always" mean absent (and destroys - even through views), and MA (or some other name) which always means ignore (and has the masking behavior with views). This makes specific behaviors tied distinctly to specific objects.
Ah - yes - thank you. I think you and I at least have somewhere to go for agreement, but, I don't know how to work towards a numpy-wide agreement. Do you have any thoughts?
For the read side, do you mean you're OK with this
a.isna()
To identify the missing values, as is currently the case? Or something else?
Yes. A missing value is a missing value, regardless of it being absent or marked as ignored. But it is a bit more subtle than that. I should just be able to add two arrays together and the "data should know what to do". When the core ufuncs get this right (like min, max, sum, cumsum, diff, etc), then I don't have to do much to prepare higher level funcs for missing data.
If so, then I think we're very close, it's just a discussion about names.
And what does ignore + absent equals. ;-)
ignore + absent == special_value_of_some_sort :) Just joking, See you, Matthew
On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett <matthew.brett@gmail.com>wrote:
Hi,
On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett <matthew.brett@gmail.com
wrote:
Hi,
On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
No, that's not what Nathaniel and I are saying at all. Nathaniel was pointing to links for projects that care that everyone agrees before they go ahead.
It looked to me like there was a serious intent to come to an agreement,
or
at least closer together. The discussion in the summer was going around in circles though, and was too abstract and complex to follow. Therefore Mark's choice of implementing something and then asking for feedback made sense to me.
I should point out that the implementation hasn't - as far as I can see - changed the discussion. The discussion was about the API.
Implementations are useful for agreed APIs because they can point out
where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected.
Implementations can also help the discussion along, by allowing people to try out some of the proposed changes. It also allows to construct examples that show weaknesses, possibly to be solved by an alternative API. Maybe you can hold the complete history of this topic in your head and comprehend it, but for me it would be very helpful if someone said: - here's my dataset - this is what I want to do with it - this is the best I can do with the current implementation - here's how API X would allow me to solve this better or simpler This can be done much better with actual data and an actual implementation than with a design proposal. You seem to disagree with this statement. That's fine. I would hope though that you recognize that concrete examples help people like me, and construct one or two to help us out.
In saying that we are insisting on our way, you are saying, implicitly, 'I am not going to negotiate'.
That is only your interpretation. The observation that Mark compromised quite a bit while you didn't seems largely correct to me.
The problem here stems from our inability to work towards agreement, rather than standing on set positions. I set out what changes I think would make the current implementation OK. Can we please, please have a discussion about those points instead of trying to argue about who has given more ground.
That commitment would of course be good. However, even if that were possible before writing code and everyone agreed that the ideas of you and Nathaniel should be implemented in full, it's still not clear that either of you would be willing to write any code. Agreement without code still doesn't help us very much.
I'm going to return to Nathaniel's point - it is a highly valuable thing to set ourselves the target of resolving substantial discussions by consensus. The route you are endorsing here is 'implementor wins'.
I'm not. All I want to point out is is that design and implementation are not completely separated either.
We don't need to do it that way. We're a mature sensible bunch of adults
Agreed:)
who can talk out the issues until we agree they are ready for implementation, and then implement.
The history of this discussion doesn't suggest it straightforward to get a design right first time. It's a complex subject. The second part of your statement, "and then implement", sounds so simple. The reality is that there are only a handful of developers who have done a significant amount of work on the numpy core in the last two years. I haven't seen anyone saying they are planning to implement (part of) whatever design the outcome of this discussion will be. I don't think it's strange to keep this in mind to some extent. Regards, Ralf
On 10/29/2011 12:26 AM, Ralf Gommers wrote:
The history of this discussion doesn't suggest it straightforward to get a design right first time. It's a complex subject.
The second part of your statement, "and then implement", sounds so simple. The reality is that there are only a handful of developers who have done a significant amount of work on the numpy core in the last two years. I haven't seen anyone saying they are planning to implement (part of) whatever design the outcome of this discussion will be. I don't think it's strange to keep this in mind to some extent.
...including the fact that last summer, Mark had a brief one-time opportunity to contribute major NA code. I expect that even if some modifications are made to what he contributed, letting him get on with it will turn out to have been the right move. Apparently Travis hopes to put in a burst of coding in 2012: http://technicaldiscovery.blogspot.com/2011/10/thoughts-on-porting-numpy-to-... Go to the section "NumPy will be evolving rapidly over the coming years". Note that "missing data bit-patterns" is on his list, consistent with his most recent messages. Eric
Hi, On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
No, that's not what Nathaniel and I are saying at all. Nathaniel was pointing to links for projects that care that everyone agrees before they go ahead.
It looked to me like there was a serious intent to come to an agreement, or at least closer together. The discussion in the summer was going around in circles though, and was too abstract and complex to follow. Therefore Mark's choice of implementing something and then asking for feedback made sense to me.
I should point out that the implementation hasn't - as far as I can see - changed the discussion. The discussion was about the API.
Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected.
Implementations can also help the discussion along, by allowing people to try out some of the proposed changes. It also allows to construct examples that show weaknesses, possibly to be solved by an alternative API. Maybe you can hold the complete history of this topic in your head and comprehend it, but for me it would be very helpful if someone said: - here's my dataset - this is what I want to do with it - this is the best I can do with the current implementation - here's how API X would allow me to solve this better or simpler This can be done much better with actual data and an actual implementation than with a design proposal. You seem to disagree with this statement. That's fine. I would hope though that you recognize that concrete examples help people like me, and construct one or two to help us out.
That's what use-cases are for in designing APIs. There are examples of use in the NEP: https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst the alterNEP: https://gist.github.com/1056379 and my longer email to Travis: http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignor... Mark has done a nice job of documentation: http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html If you want to understand what the alterNEP case is, I'd suggest the email, just because it's the most recent and I think the terminology is slightly clearer. Doing the same examples on a larger array won't make the point easier to understand. The discussion is about what the right concepts are, and you can help by looking at the snippets of code in those documents, and deciding for yourself whether you think the current masking / NA implementation seems natural and easy to explain, or rather forced and difficult to explain, and then email back trying to explain your impression (which is not always easy).
In saying that we are insisting on our way, you are saying, implicitly, 'I am not going to negotiate'.
That is only your interpretation. The observation that Mark compromised quite a bit while you didn't seems largely correct to me.
The problem here stems from our inability to work towards agreement, rather than standing on set positions. I set out what changes I think would make the current implementation OK. Can we please, please have a discussion about those points instead of trying to argue about who has given more ground.
That commitment would of course be good. However, even if that were possible before writing code and everyone agreed that the ideas of you and Nathaniel should be implemented in full, it's still not clear that either of you would be willing to write any code. Agreement without code still doesn't help us very much.
I'm going to return to Nathaniel's point - it is a highly valuable thing to set ourselves the target of resolving substantial discussions by consensus. The route you are endorsing here is 'implementor wins'.
I'm not. All I want to point out is is that design and implementation are not completely separated either.
No, they often interact. I was trying to explain why, in this case, the implementation hasn't changed the issues substantially, as far as I can see. If you think otherwise, then that is helpful information, because you can feed back about where the initial discussion has been overtaken by the implementation, and so we can strip down the discussion to its essential parts.
We don't need to do it that way. We're a mature sensible bunch of adults
Agreed:)
Ah - if only it was that easy :)
who can talk out the issues until we agree they are ready for implementation, and then implement.
The history of this discussion doesn't suggest it straightforward to get a design right first time. It's a complex subject.
Right - and it's more complex when only some of the people involved are interested in the discussion coming to a resolution. That's Nathaniel's point - that although it seems inefficient, working towards a good resolution of big issues like this is very valuable in getting the ideas right.
The second part of your statement, "and then implement", sounds so simple. The reality is that there are only a handful of developers who have done a significant amount of work on the numpy core in the last two years. I haven't seen anyone saying they are planning to implement (part of) whatever design the outcome of this discussion will be. I don't think it's strange to keep this in mind to some extent.
No, but consensus building is a little bit all or none. I guess we'd all like consensus, but then sometimes, as Nathaniel points out, it is inconvenient and annoying. If we have no stated commitment to consensus, at some unpredictable point in the discussion, those who are implementing will - obviously - just duck out and do the implementation. I would do that, I guess. Maybe I have done in the projects I'm involved in. The question Nathaniel is raising, and me too, in a less coherent way, is - is that fine? Does it matter that we are short-cutting through substantial discussions? Is that really - in the long term - a more efficient way of building both the code and the community? Best, Matthew
On Sat, Oct 29, 2011 at 1:04 PM, Matthew Brett <matthew.brett@gmail.com>wrote:
Hi,
On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
>
No, that's not what Nathaniel and I are saying at all. Nathaniel was pointing to links for projects that care that everyone agrees before they go ahead.
It looked to me like there was a serious intent to come to an
agreement,
or at least closer together. The discussion in the summer was going around in circles though, and was too abstract and complex to follow. Therefore Mark's choice of implementing something and then asking for feedback made sense to me.
I should point out that the implementation hasn't - as far as I can see - changed the discussion. The discussion was about the API.
Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected.
Implementations can also help the discussion along, by allowing people to try out some of the proposed changes. It also allows to construct examples that show weaknesses, possibly to be solved by an alternative API. Maybe you can hold the complete history of this topic in your head and comprehend it, but for me it would be very helpful if someone said: - here's my dataset - this is what I want to do with it - this is the best I can do with the current implementation - here's how API X would allow me to solve this better or simpler This can be done much better with actual data and an actual implementation than with a design proposal. You seem to disagree with this statement. That's fine. I would hope though that you recognize that concrete examples help people like me, and construct one or two to help us out. That's what use-cases are for in designing APIs. There are examples of use in the NEP:
https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst
the alterNEP:
https://gist.github.com/1056379
and my longer email to Travis:
http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignor...
Mark has done a nice job of documentation:
http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html
If you want to understand what the alterNEP case is, I'd suggest the email, just because it's the most recent and I think the terminology is slightly clearer.
Doing the same examples on a larger array won't make the point easier to understand. The discussion is about what the right concepts are, and you can help by looking at the snippets of code in those documents, and deciding for yourself whether you think the current masking / NA implementation seems natural and easy to explain, or rather forced and difficult to explain, and then email back trying to explain your impression (which is not always easy).
In saying that we are insisting on our way, you are saying, implicitly, 'I am not going to negotiate'.
That is only your interpretation. The observation that Mark compromised quite a bit while you didn't seems largely correct to me.
The problem here stems from our inability to work towards agreement, rather than standing on set positions. I set out what changes I think would make the current implementation OK. Can we please, please have a discussion about those points instead of trying to argue about who has given more ground.
That commitment would of course be good. However, even if that were possible before writing code and everyone agreed that the ideas of you and Nathaniel should be implemented in full, it's still not clear that either of you would be willing to write any code. Agreement without code still doesn't help us very much.
I'm going to return to Nathaniel's point - it is a highly valuable thing to set ourselves the target of resolving substantial discussions by consensus. The route you are endorsing here is 'implementor wins'.
I'm not. All I want to point out is is that design and implementation are not completely separated either.
No, they often interact. I was trying to explain why, in this case, the implementation hasn't changed the issues substantially, as far as I can see. If you think otherwise, then that is helpful information, because you can feed back about where the initial discussion has been overtaken by the implementation, and so we can strip down the discussion to its essential parts.
We don't need to do it that way. We're a mature sensible bunch of adults
Agreed:)
Ah - if only it was that easy :)
who can talk out the issues until we agree they are ready for implementation, and then implement.
The history of this discussion doesn't suggest it straightforward to get a design right first time. It's a complex subject.
Right - and it's more complex when only some of the people involved are interested in the discussion coming to a resolution. That's Nathaniel's point - that although it seems inefficient, working towards a good resolution of big issues like this is very valuable in getting the ideas right.
The second part of your statement, "and then implement", sounds so simple. The reality is that there are only a handful of developers who have done a significant amount of work on the numpy core in the last two years. I haven't seen anyone saying they are planning to implement (part of) whatever design the outcome of this discussion will be. I don't think it's strange to keep this in mind to some extent.
No, but consensus building is a little bit all or none. I guess we'd all like consensus, but then sometimes, as Nathaniel points out, it is inconvenient and annoying. If we have no stated commitment to consensus, at some unpredictable point in the discussion, those who are implementing will - obviously - just duck out and do the implementation. I would do that, I guess. Maybe I have done in the projects I'm involved in. The question Nathaniel is raising, and me too, in a less coherent way, is - is that fine? Does it matter that we are short-cutting through substantial discussions? Is that really - in the long term - a more efficient way of building both the code and the community?
Who is counted in building a consensus? I tend to pay attention to those who have made consistent contributions over the years, reviewed code, fixed bugs, and have generally been active in numpy development. In any group participation is important, people who just walk in the door and demand things be done their way aren't going to get a lot of respect. I'll happily listen to politely expressed feedback, especially if the feedback comes from someone who shows up to work, but that hasn't been my impression of the disagreements in this case. Heck, Nathaniel wasn't even tracking the Numpy pull requests or Mark's repository. That doesn't spell "participant" in my dictionary. Chuck
Hi, On Sat, Oct 29, 2011 at 12:19 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Sat, Oct 29, 2011 at 1:04 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris <charlesr.harris@gmail.com> wrote: >>
No, that's not what Nathaniel and I are saying at all. Nathaniel was pointing to links for projects that care that everyone agrees before they go ahead.
It looked to me like there was a serious intent to come to an agreement, or at least closer together. The discussion in the summer was going around in circles though, and was too abstract and complex to follow. Therefore Mark's choice of implementing something and then asking for feedback made sense to me.
I should point out that the implementation hasn't - as far as I can see - changed the discussion. The discussion was about the API.
Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected.
Implementations can also help the discussion along, by allowing people to try out some of the proposed changes. It also allows to construct examples that show weaknesses, possibly to be solved by an alternative API. Maybe you can hold the complete history of this topic in your head and comprehend it, but for me it would be very helpful if someone said: - here's my dataset - this is what I want to do with it - this is the best I can do with the current implementation - here's how API X would allow me to solve this better or simpler This can be done much better with actual data and an actual implementation than with a design proposal. You seem to disagree with this statement. That's fine. I would hope though that you recognize that concrete examples help people like me, and construct one or two to help us out.
That's what use-cases are for in designing APIs. There are examples of use in the NEP:
https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst
the alterNEP:
https://gist.github.com/1056379
and my longer email to Travis:
http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignor...
Mark has done a nice job of documentation:
http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html
If you want to understand what the alterNEP case is, I'd suggest the email, just because it's the most recent and I think the terminology is slightly clearer.
Doing the same examples on a larger array won't make the point easier to understand. The discussion is about what the right concepts are, and you can help by looking at the snippets of code in those documents, and deciding for yourself whether you think the current masking / NA implementation seems natural and easy to explain, or rather forced and difficult to explain, and then email back trying to explain your impression (which is not always easy).
In saying that we are insisting on our way, you are saying, implicitly, 'I am not going to negotiate'.
That is only your interpretation. The observation that Mark compromised quite a bit while you didn't seems largely correct to me.
The problem here stems from our inability to work towards agreement, rather than standing on set positions. I set out what changes I think would make the current implementation OK. Can we please, please have a discussion about those points instead of trying to argue about who has given more ground.
That commitment would of course be good. However, even if that were possible before writing code and everyone agreed that the ideas of you and Nathaniel should be implemented in full, it's still not clear that either of you would be willing to write any code. Agreement without code still doesn't help us very much.
I'm going to return to Nathaniel's point - it is a highly valuable thing to set ourselves the target of resolving substantial discussions by consensus. The route you are endorsing here is 'implementor wins'.
I'm not. All I want to point out is is that design and implementation are not completely separated either.
No, they often interact. I was trying to explain why, in this case, the implementation hasn't changed the issues substantially, as far as I can see. If you think otherwise, then that is helpful information, because you can feed back about where the initial discussion has been overtaken by the implementation, and so we can strip down the discussion to its essential parts.
We don't need to do it that way. We're a mature sensible bunch of adults
Agreed:)
Ah - if only it was that easy :)
who can talk out the issues until we agree they are ready for implementation, and then implement.
The history of this discussion doesn't suggest it straightforward to get a design right first time. It's a complex subject.
Right - and it's more complex when only some of the people involved are interested in the discussion coming to a resolution. That's Nathaniel's point - that although it seems inefficient, working towards a good resolution of big issues like this is very valuable in getting the ideas right.
The second part of your statement, "and then implement", sounds so simple. The reality is that there are only a handful of developers who have done a significant amount of work on the numpy core in the last two years. I haven't seen anyone saying they are planning to implement (part of) whatever design the outcome of this discussion will be. I don't think it's strange to keep this in mind to some extent.
No, but consensus building is a little bit all or none. I guess we'd all like consensus, but then sometimes, as Nathaniel points out, it is inconvenient and annoying. If we have no stated commitment to consensus, at some unpredictable point in the discussion, those who are implementing will - obviously - just duck out and do the implementation. I would do that, I guess. Maybe I have done in the projects I'm involved in. The question Nathaniel is raising, and me too, in a less coherent way, is - is that fine? Does it matter that we are short-cutting through substantial discussions? Is that really - in the long term - a more efficient way of building both the code and the community?
Who is counted in building a consensus? I tend to pay attention to those who have made consistent contributions over the years, reviewed code, fixed bugs, and have generally been active in numpy development. In any group participation is important, people who just walk in the door and demand things be done their way aren't going to get a lot of respect. I'll happily listen to politely expressed feedback, especially if the feedback comes from someone who shows up to work, but that hasn't been my impression of the disagreements in this case. Heck, Nathaniel wasn't even tracking the Numpy pull requests or Mark's repository. That doesn't spell "participant" in my dictionary.
I'm sorry, I am not obeying Ben's 10 minute rule. This is a very important point you are making, which is that those who write the code have the final say. Is it fair to say that your responses show that you don't think either Nathaniel or I have much of a say? It's fair to say I haven't contributed much code to numpy. I could imagine some sort of voting system for which the voting is weighted by lines of code contributed. I suspect you are thinking of an implicit version of such a system, continuously employed. But Nathaniel's point is that other projects have gone out of their way to avoid voting. To quote from: http://producingoss.com/en/consensus-democracy.html "In general, taking a vote should be very rare—a last resort for when all other options have failed. Don't think of voting as a great way to resolve debates. It isn't. It ends discussion, and thereby ends creative thinking about the problem. As long as discussion continues, there is the possibility that someone will come up with a new solution everyone likes. " Best, Matthew
On Sat, Oct 29, 2011 at 1:26 PM, Matthew Brett <matthew.brett@gmail.com>wrote:
Hi,
On Sat, Oct 29, 2011 at 12:19 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Sat, Oct 29, 2011 at 1:04 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett <
wrote:
Hi,
On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett <matthew.brett@gmail.com> wrote: > > Hi, > > On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris > <charlesr.harris@gmail.com> wrote: > >> > > No, that's not what Nathaniel and I are saying at all. Nathaniel
was
> pointing to links for projects that care that everyone agrees before > they go ahead.
It looked to me like there was a serious intent to come to an agreement, or at least closer together. The discussion in the summer was going around in circles though, and was too abstract and complex to follow. Therefore Mark's choice of implementing something and then asking for feedback made sense to me.
I should point out that the implementation hasn't - as far as I can see - changed the discussion. The discussion was about the API.
Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement
at least as far as I can see. Again, I'm happy to be corrected.
Implementations can also help the discussion along, by allowing people to try out some of the proposed changes. It also allows to construct examples that show weaknesses, possibly to be solved by an alternative API. Maybe you can hold the complete history of this topic in your head and comprehend it, but for me it would be very helpful if someone said: - here's my dataset - this is what I want to do with it - this is the best I can do with the current implementation - here's how API X would allow me to solve this better or simpler This can be done much better with actual data and an actual implementation than with a design proposal. You seem to disagree with this statement. That's fine. I would hope though that you recognize that concrete examples help people like me, and construct one or two to help us out. That's what use-cases are for in designing APIs. There are examples of use in the NEP:
https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst
the alterNEP:
https://gist.github.com/1056379
and my longer email to Travis:
http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignor...
Mark has done a nice job of documentation:
http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html
If you want to understand what the alterNEP case is, I'd suggest the email, just because it's the most recent and I think the terminology is slightly clearer.
Doing the same examples on a larger array won't make the point easier to understand. The discussion is about what the right concepts are, and you can help by looking at the snippets of code in those documents, and deciding for yourself whether you think the current masking / NA implementation seems natural and easy to explain, or rather forced and difficult to explain, and then email back trying to explain your impression (which is not always easy).
> In saying that we are insisting on our way, you are saying, > implicitly, > 'I > am not going to negotiate'.
That is only your interpretation. The observation that Mark compromised quite a bit while you didn't seems largely correct to me.
The problem here stems from our inability to work towards agreement, rather than standing on set positions. I set out what changes I
matthew.brett@gmail.com> - think
would make the current implementation OK. Can we please, please have a discussion about those points instead of trying to argue about who has given more ground.
That commitment would of course be good. However, even if that were possible before writing code and everyone agreed that the ideas of you and Nathaniel should be implemented in full, it's still not clear that either of you would be willing to write any code. Agreement without code still doesn't help us very much.
I'm going to return to Nathaniel's point - it is a highly valuable thing to set ourselves the target of resolving substantial discussions by consensus. The route you are endorsing here is 'implementor wins'.
I'm not. All I want to point out is is that design and implementation are not completely separated either.
No, they often interact. I was trying to explain why, in this case, the implementation hasn't changed the issues substantially, as far as I can see. If you think otherwise, then that is helpful information, because you can feed back about where the initial discussion has been overtaken by the implementation, and so we can strip down the discussion to its essential parts.
We don't need to do it that way. We're a mature sensible bunch of adults
Agreed:)
Ah - if only it was that easy :)
who can talk out the issues until we agree they are ready for implementation, and then implement.
The history of this discussion doesn't suggest it straightforward to get a design right first time. It's a complex subject.
Right - and it's more complex when only some of the people involved are interested in the discussion coming to a resolution. That's Nathaniel's point - that although it seems inefficient, working towards a good resolution of big issues like this is very valuable in getting the ideas right.
The second part of your statement, "and then implement", sounds so simple. The reality is that there are only a handful of developers who have done a significant amount of work on the numpy core in the last two years. I haven't seen anyone saying they are planning to implement (part of) whatever design the outcome of this discussion will be. I don't think it's strange to keep this in mind to some extent.
No, but consensus building is a little bit all or none. I guess we'd all like consensus, but then sometimes, as Nathaniel points out, it is inconvenient and annoying. If we have no stated commitment to consensus, at some unpredictable point in the discussion, those who are implementing will - obviously - just duck out and do the implementation. I would do that, I guess. Maybe I have done in the projects I'm involved in. The question Nathaniel is raising, and me too, in a less coherent way, is - is that fine? Does it matter that we are short-cutting through substantial discussions? Is that really - in the long term - a more efficient way of building both the code and the community?
Who is counted in building a consensus? I tend to pay attention to those who have made consistent contributions over the years, reviewed code, fixed bugs, and have generally been active in numpy development. In any group participation is important, people who just walk in the door and demand things be done their way aren't going to get a lot of respect. I'll happily listen to politely expressed feedback, especially if the feedback comes from someone who shows up to work, but that hasn't been my impression of the disagreements in this case. Heck, Nathaniel wasn't even tracking the Numpy pull requests or Mark's repository. That doesn't spell "participant" in my dictionary.
I'm sorry, I am not obeying Ben's 10 minute rule.
This is a very important point you are making, which is that those who write the code have the final say.
Is it fair to say that your responses show that you don't think either Nathaniel or I have much of a say?
It's fair to say I haven't contributed much code to numpy.
But you have contributed some, which immediately gives you more credibility.
I could imagine some sort of voting system for which the voting is weighted by lines of code contributed.
Mark has been the man over the last year. By comparison, the rest of us have just been diddling around.
I suspect you are thinking of an implicit version of such a system, continuously employed.
But Nathaniel's point is that other projects have gone out of their way to avoid voting. To quote from:
http://producingoss.com/en/consensus-democracy.html
"In general, taking a vote should be very rare—a last resort for when all other options have failed. Don't think of voting as a great way to resolve debates. It isn't. It ends discussion, and thereby ends creative thinking about the problem. As long as discussion continues, there is the possibility that someone will come up with a new solution everyone likes. "
As Ralf pointed out, the core developers are a small handful at the moment. Now in one sense that presents an opportunity: anyone who has the time and inclination to contribute code and review pull requests is going to make an impact and rapidly gain influence. In a sense, leadership in the numpy community is up for grabs. But before you can claim the kingdom, there is the small matter of completing a quest or two. Chuck
Hi, On Sat, Oct 29, 2011 at 12:41 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Sat, Oct 29, 2011 at 1:26 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 12:19 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Sat, Oct 29, 2011 at 1:04 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote: > > > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett > <matthew.brett@gmail.com> > wrote: >> >> Hi, >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >> <charlesr.harris@gmail.com> wrote: >> >> >> >> No, that's not what Nathaniel and I are saying at all. Nathaniel >> was >> pointing to links for projects that care that everyone agrees >> before >> they go ahead. > > It looked to me like there was a serious intent to come to an > agreement, > or > at least closer together. The discussion in the summer was going > around > in > circles though, and was too abstract and complex to follow. > Therefore > Mark's > choice of implementing something and then asking for feedback made > sense > to > me.
I should point out that the implementation hasn't - as far as I can see - changed the discussion. The discussion was about the API.
Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected.
Implementations can also help the discussion along, by allowing people to try out some of the proposed changes. It also allows to construct examples that show weaknesses, possibly to be solved by an alternative API. Maybe you can hold the complete history of this topic in your head and comprehend it, but for me it would be very helpful if someone said: - here's my dataset - this is what I want to do with it - this is the best I can do with the current implementation - here's how API X would allow me to solve this better or simpler This can be done much better with actual data and an actual implementation than with a design proposal. You seem to disagree with this statement. That's fine. I would hope though that you recognize that concrete examples help people like me, and construct one or two to help us out.
That's what use-cases are for in designing APIs. There are examples of use in the NEP:
https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst
the alterNEP:
https://gist.github.com/1056379
and my longer email to Travis:
http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignor...
Mark has done a nice job of documentation:
http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html
If you want to understand what the alterNEP case is, I'd suggest the email, just because it's the most recent and I think the terminology is slightly clearer.
Doing the same examples on a larger array won't make the point easier to understand. The discussion is about what the right concepts are, and you can help by looking at the snippets of code in those documents, and deciding for yourself whether you think the current masking / NA implementation seems natural and easy to explain, or rather forced and difficult to explain, and then email back trying to explain your impression (which is not always easy).
>> In saying that we are insisting on our way, you are saying, >> implicitly, >> 'I >> am not going to negotiate'. > > That is only your interpretation. The observation that Mark > compromised > quite a bit while you didn't seems largely correct to me.
The problem here stems from our inability to work towards agreement, rather than standing on set positions. I set out what changes I think would make the current implementation OK. Can we please, please have a discussion about those points instead of trying to argue about who has given more ground.
> That commitment would of course be good. However, even if that > were > possible > before writing code and everyone agreed that the ideas of you and > Nathaniel > should be implemented in full, it's still not clear that either of > you > would > be willing to write any code. Agreement without code still doesn't > help > us > very much.
I'm going to return to Nathaniel's point - it is a highly valuable thing to set ourselves the target of resolving substantial discussions by consensus. The route you are endorsing here is 'implementor wins'.
I'm not. All I want to point out is is that design and implementation are not completely separated either.
No, they often interact. I was trying to explain why, in this case, the implementation hasn't changed the issues substantially, as far as I can see. If you think otherwise, then that is helpful information, because you can feed back about where the initial discussion has been overtaken by the implementation, and so we can strip down the discussion to its essential parts.
We don't need to do it that way. We're a mature sensible bunch of adults
Agreed:)
Ah - if only it was that easy :)
who can talk out the issues until we agree they are ready for implementation, and then implement.
The history of this discussion doesn't suggest it straightforward to get a design right first time. It's a complex subject.
Right - and it's more complex when only some of the people involved are interested in the discussion coming to a resolution. That's Nathaniel's point - that although it seems inefficient, working towards a good resolution of big issues like this is very valuable in getting the ideas right.
The second part of your statement, "and then implement", sounds so simple. The reality is that there are only a handful of developers who have done a significant amount of work on the numpy core in the last two years. I haven't seen anyone saying they are planning to implement (part of) whatever design the outcome of this discussion will be. I don't think it's strange to keep this in mind to some extent.
No, but consensus building is a little bit all or none. I guess we'd all like consensus, but then sometimes, as Nathaniel points out, it is inconvenient and annoying. If we have no stated commitment to consensus, at some unpredictable point in the discussion, those who are implementing will - obviously - just duck out and do the implementation. I would do that, I guess. Maybe I have done in the projects I'm involved in. The question Nathaniel is raising, and me too, in a less coherent way, is - is that fine? Does it matter that we are short-cutting through substantial discussions? Is that really - in the long term - a more efficient way of building both the code and the community?
Who is counted in building a consensus? I tend to pay attention to those who have made consistent contributions over the years, reviewed code, fixed bugs, and have generally been active in numpy development. In any group participation is important, people who just walk in the door and demand things be done their way aren't going to get a lot of respect. I'll happily listen to politely expressed feedback, especially if the feedback comes from someone who shows up to work, but that hasn't been my impression of the disagreements in this case. Heck, Nathaniel wasn't even tracking the Numpy pull requests or Mark's repository. That doesn't spell "participant" in my dictionary.
I'm sorry, I am not obeying Ben's 10 minute rule.
This is a very important point you are making, which is that those who write the code have the final say.
Is it fair to say that your responses show that you don't think either Nathaniel or I have much of a say?
It's fair to say I haven't contributed much code to numpy.
But you have contributed some, which immediately gives you more credibility.
I could imagine some sort of voting system for which the voting is weighted by lines of code contributed.
Mark has been the man over the last year. By comparison, the rest of us have just been diddling around.
I suspect you are thinking of an implicit version of such a system, continuously employed.
But Nathaniel's point is that other projects have gone out of their way to avoid voting. To quote from:
http://producingoss.com/en/consensus-democracy.html
"In general, taking a vote should be very rare—a last resort for when all other options have failed. Don't think of voting as a great way to resolve debates. It isn't. It ends discussion, and thereby ends creative thinking about the problem. As long as discussion continues, there is the possibility that someone will come up with a new solution everyone likes. "
As Ralf pointed out, the core developers are a small handful at the moment. Now in one sense that presents an opportunity: anyone who has the time and inclination to contribute code and review pull requests is going to make an impact and rapidly gain influence. In a sense, leadership in the numpy community is up for grabs. But before you can claim the kingdom, there is the small matter of completing a quest or two.
Yes, this is well-put - but I think I am asking for a less feudal model of decision making. The model you are offering is one of power - where power is acquired by code contributions. I suppose this model is attractive if you don't believe that it is generally possible to achieve an agreed solution through general and open discussion. The more effective model is democratic, that is, we have faith in each other to be reasonable and to negotiate in the best interests of the project, and we use measures of influence as an absolute last resort, and even then, this influence should be determined on explicit grounds (such as agreement across the group, number of lines committed or some other thing). Best, Matthew
On Saturday, October 29, 2011, Charles R Harris <charlesr.harris@gmail.com> wrote:
Who is counted in building a consensus? I tend to pay attention to those
who have made consistent contributions over the years, reviewed code, fixed bugs, and have generally been active in numpy development. In any group participation is important, people who just walk in the door and demand things be done their way aren't going to get a lot of respect. I'll happily listen to politely expressed feedback, especially if the feedback comes from someone who shows up to work, but that hasn't been my impression of the disagreements in this case. Heck, Nathaniel wasn't even tracking the Numpy pull requests or Mark's repository. That doesn't spell "participant" in my dictionary.
Chuck
This is a very good point, but I would highly caution against alienating anybody here. Frankly, I am surprised how much my opinion has been taken here given the very little numpy code I have submitted (I think maybe two or three patches). The Numpy community is far more than just those who use the core library. There is pandas, bottleneck, mpl, the scikits, and much more. Numpy would be nearly useless without them, and certainly vice versa. We are all indebted to each other for our works. We must never lose that perspective. We all seem to have a different set of assumptions of how development should work. Each project follows its own workflow. Numpy should be free to adopt their own procedures, and we are free to discuss them. I do agree with chuck that he shouldn't have to make a written invitation to each and every person to review each pull. However, maybe some work can be done to bring the pull request and issues discussion down to the mailing list. I would like to do something similar with mpl. As for voting rights, let's make that a separate discussion. Ben Root
On Sat, Oct 29, 2011 at 1:41 PM, Benjamin Root <ben.root@ou.edu> wrote:
On Saturday, October 29, 2011, Charles R Harris <charlesr.harris@gmail.com> wrote:
Who is counted in building a consensus? I tend to pay attention to those
who have made consistent contributions over the years, reviewed code, fixed bugs, and have generally been active in numpy development. In any group participation is important, people who just walk in the door and demand things be done their way aren't going to get a lot of respect. I'll happily listen to politely expressed feedback, especially if the feedback comes from someone who shows up to work, but that hasn't been my impression of the disagreements in this case. Heck, Nathaniel wasn't even tracking the Numpy pull requests or Mark's repository. That doesn't spell "participant" in my dictionary.
Chuck
This is a very good point, but I would highly caution against alienating anybody here. Frankly, I am surprised how much my opinion has been taken here given the very little numpy code I have submitted (I think maybe two or three patches). The Numpy community is far more than just those who use the core library. There is pandas, bottleneck, mpl, the scikits, and much more. Numpy would be nearly useless without them, and certainly vice versa.
I was quite impressed by your comments on Mark's work, I thought they were excellent. It doesn't really take much to make an impact in a small community overburdened by work.
We are all indebted to each other for our works. We must never lose that perspective.
We all seem to have a different set of assumptions of how development should work. Each project follows its own workflow. Numpy should be free to adopt their own procedures, and we are free to discuss them.
I do agree with chuck that he shouldn't have to make a written invitation to each and every person to review each pull. However, maybe some work can be done to bring the pull request and issues discussion down to the mailing list. I would like to do something similar with mpl.
As for voting rights, let's make that a separate discussion.
With such a small community, I'd rather avoid the whole voting thing if possible. Chuck
Hi, On Sat, Oct 29, 2011 at 1:05 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Sat, Oct 29, 2011 at 1:41 PM, Benjamin Root <ben.root@ou.edu> wrote:
On Saturday, October 29, 2011, Charles R Harris <charlesr.harris@gmail.com> wrote:
Who is counted in building a consensus? I tend to pay attention to those who have made consistent contributions over the years, reviewed code, fixed bugs, and have generally been active in numpy development. In any group participation is important, people who just walk in the door and demand things be done their way aren't going to get a lot of respect. I'll happily listen to politely expressed feedback, especially if the feedback comes from someone who shows up to work, but that hasn't been my impression of the disagreements in this case. Heck, Nathaniel wasn't even tracking the Numpy pull requests or Mark's repository. That doesn't spell "participant" in my dictionary.
Chuck
This is a very good point, but I would highly caution against alienating anybody here. Frankly, I am surprised how much my opinion has been taken here given the very little numpy code I have submitted (I think maybe two or three patches). The Numpy community is far more than just those who use the core library. There is pandas, bottleneck, mpl, the scikits, and much more. Numpy would be nearly useless without them, and certainly vice versa.
I was quite impressed by your comments on Mark's work, I thought they were excellent. It doesn't really take much to make an impact in a small community overburdened by work.
We are all indebted to each other for our works. We must never lose that perspective.
We all seem to have a different set of assumptions of how development should work. Each project follows its own workflow. Numpy should be free to adopt their own procedures, and we are free to discuss them.
I do agree with chuck that he shouldn't have to make a written invitation to each and every person to review each pull. However, maybe some work can be done to bring the pull request and issues discussion down to the mailing list. I would like to do something similar with mpl.
As for voting rights, let's make that a separate discussion.
With such a small community, I'd rather avoid the whole voting thing if possible.
But, if there is one thing worse than voting, it is implicit voting. Implicit voting is where you ignore people who you don't think should have a voice. Unless I'm mistaken, that's what you are suggesting should be the norm. Best, Matthew
On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett <matthew.brett@gmail.com>wrote:
Hi,
On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
>
No, that's not what Nathaniel and I are saying at all. Nathaniel was pointing to links for projects that care that everyone agrees before they go ahead.
It looked to me like there was a serious intent to come to an
agreement,
or at least closer together. The discussion in the summer was going around in circles though, and was too abstract and complex to follow. Therefore Mark's choice of implementing something and then asking for feedback made sense to me.
I should point out that the implementation hasn't - as far as I can see - changed the discussion. The discussion was about the API.
Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected.
Implementations can also help the discussion along, by allowing people to try out some of the proposed changes. It also allows to construct examples that show weaknesses, possibly to be solved by an alternative API. Maybe you can hold the complete history of this topic in your head and comprehend it, but for me it would be very helpful if someone said: - here's my dataset - this is what I want to do with it - this is the best I can do with the current implementation - here's how API X would allow me to solve this better or simpler This can be done much better with actual data and an actual implementation than with a design proposal. You seem to disagree with this statement. That's fine. I would hope though that you recognize that concrete examples help people like me, and construct one or two to help us out. That's what use-cases are for in designing APIs. There are examples of use in the NEP:
https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst
the alterNEP:
https://gist.github.com/1056379
and my longer email to Travis:
http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignor...
Mark has done a nice job of documentation:
http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html
If you want to understand what the alterNEP case is, I'd suggest the email, just because it's the most recent and I think the terminology is slightly clearer.
Doing the same examples on a larger array won't make the point easier to understand. The discussion is about what the right concepts are, and you can help by looking at the snippets of code in those documents, and deciding for yourself whether you think the current masking / NA implementation seems natural and easy to explain, or rather forced and difficult to explain, and then email back trying to explain your impression (which is not always easy).
If you seriously believe that looking at a few snippets is as helpful and instructive as being able to play around with them in IPython and modify them, then I guess we won't make progress in this part of the discussion. You're just telling me to go back and re-read things I'd already read. OK, update: I took Ben's 10 minutes to go back and read the reference doc and your email again, just in case. The current implementation still seems natural to me to explain. It fits my use-cases. Perhaps that's different for you because you and I deal with different kinds of data. I don't have to explicitly treat absent and ignored data differently; those two are actually mixed and indistinguishable already in much of my data. Therefore the current implementation works well for me, having to make a distinction would be a needless complication. Ralf
Hi, On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris <charlesr.harris@gmail.com> wrote: >>
No, that's not what Nathaniel and I are saying at all. Nathaniel was pointing to links for projects that care that everyone agrees before they go ahead.
It looked to me like there was a serious intent to come to an agreement, or at least closer together. The discussion in the summer was going around in circles though, and was too abstract and complex to follow. Therefore Mark's choice of implementing something and then asking for feedback made sense to me.
I should point out that the implementation hasn't - as far as I can see - changed the discussion. The discussion was about the API.
Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected.
Implementations can also help the discussion along, by allowing people to try out some of the proposed changes. It also allows to construct examples that show weaknesses, possibly to be solved by an alternative API. Maybe you can hold the complete history of this topic in your head and comprehend it, but for me it would be very helpful if someone said: - here's my dataset - this is what I want to do with it - this is the best I can do with the current implementation - here's how API X would allow me to solve this better or simpler This can be done much better with actual data and an actual implementation than with a design proposal. You seem to disagree with this statement. That's fine. I would hope though that you recognize that concrete examples help people like me, and construct one or two to help us out.
That's what use-cases are for in designing APIs. There are examples of use in the NEP:
https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst
the alterNEP:
https://gist.github.com/1056379
and my longer email to Travis:
http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignor...
Mark has done a nice job of documentation:
http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html
If you want to understand what the alterNEP case is, I'd suggest the email, just because it's the most recent and I think the terminology is slightly clearer.
Doing the same examples on a larger array won't make the point easier to understand. The discussion is about what the right concepts are, and you can help by looking at the snippets of code in those documents, and deciding for yourself whether you think the current masking / NA implementation seems natural and easy to explain, or rather forced and difficult to explain, and then email back trying to explain your impression (which is not always easy).
If you seriously believe that looking at a few snippets is as helpful and instructive as being able to play around with them in IPython and modify them, then I guess we won't make progress in this part of the discussion. You're just telling me to go back and re-read things I'd already read.
The snippets are in ipython or doctest format - aren't they?
OK, update: I took Ben's 10 minutes to go back and read the reference doc and your email again, just in case. The current implementation still seems natural to me to explain. It fits my use-cases. Perhaps that's different for you because you and I deal with different kinds of data. I don't have to explicitly treat absent and ignored data differently; those two are actually mixed and indistinguishable already in much of my data. Therefore the current implementation works well for me, having to make a distinction would be a needless complication.
OK - I'm not sure that contributes much to the discussion, because the problem is being able to explain to each other in details why one solution is preferable to another. To follow your own advice, you'd post some code snippets showing how you'd see the two ideas playing out and why one is clearer than the other. Best, Matthew
Hi, On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett <matthew.brett@gmail.com> wrote: > > Hi, > > On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris > <charlesr.harris@gmail.com> wrote: > >> > > No, that's not what Nathaniel and I are saying at all. Nathaniel was > pointing to links for projects that care that everyone agrees before > they go ahead.
It looked to me like there was a serious intent to come to an agreement, or at least closer together. The discussion in the summer was going around in circles though, and was too abstract and complex to follow. Therefore Mark's choice of implementing something and then asking for feedback made sense to me.
I should point out that the implementation hasn't - as far as I can see - changed the discussion. The discussion was about the API.
Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected.
Implementations can also help the discussion along, by allowing people to try out some of the proposed changes. It also allows to construct examples that show weaknesses, possibly to be solved by an alternative API. Maybe you can hold the complete history of this topic in your head and comprehend it, but for me it would be very helpful if someone said: - here's my dataset - this is what I want to do with it - this is the best I can do with the current implementation - here's how API X would allow me to solve this better or simpler This can be done much better with actual data and an actual implementation than with a design proposal. You seem to disagree with this statement. That's fine. I would hope though that you recognize that concrete examples help people like me, and construct one or two to help us out.
That's what use-cases are for in designing APIs. There are examples of use in the NEP:
https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst
the alterNEP:
https://gist.github.com/1056379
and my longer email to Travis:
http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignor...
Mark has done a nice job of documentation:
http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html
If you want to understand what the alterNEP case is, I'd suggest the email, just because it's the most recent and I think the terminology is slightly clearer.
Doing the same examples on a larger array won't make the point easier to understand. The discussion is about what the right concepts are, and you can help by looking at the snippets of code in those documents, and deciding for yourself whether you think the current masking / NA implementation seems natural and easy to explain, or rather forced and difficult to explain, and then email back trying to explain your impression (which is not always easy).
If you seriously believe that looking at a few snippets is as helpful and instructive as being able to play around with them in IPython and modify them, then I guess we won't make progress in this part of the discussion. You're just telling me to go back and re-read things I'd already read.
The snippets are in ipython or doctest format - aren't they?
Oops - 10 minute rule. Now I see that you mean that you can't experiment with the alternative implementation without working code. That's true, but I am hoping that the difference between - say: a[0:2] = np.NA and a.mask[0:2] = False would be easy enough to imagine. If it isn't then, let me know, preferably with something like "I can't see exactly how the following [code snippet] would work in your conception of the problem" - and then I can either try and give fake examples, or write a mock up. Best, Matthew
On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett <matthew.brett@gmail.com>wrote:
Hi,
Hi,
On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett <matthew.brett@gmail.com
wrote:
Hi,
On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett <
matthew.brett@gmail.com>
wrote:
Hi,
On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote: > > > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett > <matthew.brett@gmail.com> > wrote: >> >> Hi, >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >> <charlesr.harris@gmail.com> wrote: >> >> >> >> No, that's not what Nathaniel and I are saying at all. Nathaniel
was
>> pointing to links for projects that care that everyone agrees before >> they go ahead. > > It looked to me like there was a serious intent to come to an > agreement, > or > at least closer together. The discussion in the summer was going > around > in > circles though, and was too abstract and complex to follow. Therefore > Mark's > choice of implementing something and then asking for feedback made > sense > to > me.
I should point out that the implementation hasn't - as far as I can see - changed the discussion. The discussion was about the API.
Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement
at least as far as I can see. Again, I'm happy to be corrected.
Implementations can also help the discussion along, by allowing
On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett <matthew.brett@gmail.com> wrote: - people
to try out some of the proposed changes. It also allows to construct examples that show weaknesses, possibly to be solved by an alternative API. Maybe you can hold the complete history of this topic in your head and comprehend it, but for me it would be very helpful if someone said: - here's my dataset - this is what I want to do with it - this is the best I can do with the current implementation - here's how API X would allow me to solve this better or simpler This can be done much better with actual data and an actual implementation than with a design proposal. You seem to disagree with this statement. That's fine. I would hope though that you recognize that concrete examples help people like me, and construct one or two to help us out. That's what use-cases are for in designing APIs. There are examples of use in the NEP:
https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst
the alterNEP:
https://gist.github.com/1056379
and my longer email to Travis:
http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignor...
Mark has done a nice job of documentation:
http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html
If you want to understand what the alterNEP case is, I'd suggest the email, just because it's the most recent and I think the terminology is slightly clearer.
Doing the same examples on a larger array won't make the point easier to understand. The discussion is about what the right concepts are, and you can help by looking at the snippets of code in those documents, and deciding for yourself whether you think the current masking / NA implementation seems natural and easy to explain, or rather forced and difficult to explain, and then email back trying to explain your impression (which is not always easy).
If you seriously believe that looking at a few snippets is as helpful and instructive as being able to play around with them in IPython and modify them, then I guess we won't make progress in this part of the discussion. You're just telling me to go back and re-read things I'd already read.
The snippets are in ipython or doctest format - aren't they?
Oops - 10 minute rule. Now I see that you mean that you can't experiment with the alternative implementation without working code.
Indeed.
That's true, but I am hoping that the difference between - say:
a[0:2] = np.NA
and
a.mask[0:2] = False
would be easy enough to imagine.
It is in this case. I agree the explicit ``a.mask`` is clearer. This is a quite specific point that could be improved in the current implementation. It doesn't require ripping everything out. Ralf
Hi, On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett <matthew.brett@gmail.com> wrote: > > Hi, > > On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers > <ralf.gommers@googlemail.com> wrote: > > > > > > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett > > <matthew.brett@gmail.com> > > wrote: > >> > >> Hi, > >> > >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris > >> <charlesr.harris@gmail.com> wrote: > >> >> > >> > >> No, that's not what Nathaniel and I are saying at all. Nathaniel > >> was > >> pointing to links for projects that care that everyone agrees > >> before > >> they go ahead. > > > > It looked to me like there was a serious intent to come to an > > agreement, > > or > > at least closer together. The discussion in the summer was going > > around > > in > > circles though, and was too abstract and complex to follow. > > Therefore > > Mark's > > choice of implementing something and then asking for feedback > > made > > sense > > to > > me. > > I should point out that the implementation hasn't - as far as I can > see - changed the discussion. The discussion was about the API. > > Implementations are useful for agreed APIs because they can point > out > where the API does not make sense or cannot be implemented. In > this > case, the API Mark said he was going to implement - he did > implement - > at least as far as I can see. Again, I'm happy to be corrected.
Implementations can also help the discussion along, by allowing people to try out some of the proposed changes. It also allows to construct examples that show weaknesses, possibly to be solved by an alternative API. Maybe you can hold the complete history of this topic in your head and comprehend it, but for me it would be very helpful if someone said: - here's my dataset - this is what I want to do with it - this is the best I can do with the current implementation - here's how API X would allow me to solve this better or simpler This can be done much better with actual data and an actual implementation than with a design proposal. You seem to disagree with this statement. That's fine. I would hope though that you recognize that concrete examples help people like me, and construct one or two to help us out.
That's what use-cases are for in designing APIs. There are examples of use in the NEP:
https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst
the alterNEP:
https://gist.github.com/1056379
and my longer email to Travis:
http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignor...
Mark has done a nice job of documentation:
http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html
If you want to understand what the alterNEP case is, I'd suggest the email, just because it's the most recent and I think the terminology is slightly clearer.
Doing the same examples on a larger array won't make the point easier to understand. The discussion is about what the right concepts are, and you can help by looking at the snippets of code in those documents, and deciding for yourself whether you think the current masking / NA implementation seems natural and easy to explain, or rather forced and difficult to explain, and then email back trying to explain your impression (which is not always easy).
If you seriously believe that looking at a few snippets is as helpful and instructive as being able to play around with them in IPython and modify them, then I guess we won't make progress in this part of the discussion. You're just telling me to go back and re-read things I'd already read.
The snippets are in ipython or doctest format - aren't they?
Oops - 10 minute rule. Now I see that you mean that you can't experiment with the alternative implementation without working code.
Indeed.
That's true, but I am hoping that the difference between - say:
a[0:2] = np.NA
and
a.mask[0:2] = False
would be easy enough to imagine.
It is in this case. I agree the explicit ``a.mask`` is clearer. This is a quite specific point that could be improved in the current implementation.
Thanks - this is helpful.
It doesn't require ripping everything out.
Nathaniel wasn't proposing 'ripping everything out' - but backing off until consensus has been reached. That's different. If you think we should not do that, and you are interested, please say why. Second - I was proposing that we do indeed keep the code in the codebase but discuss adaptations that could achieve consensus. See you, Matthew
On Sat, Oct 29, 2011 at 3:55 PM, Matthew Brett <matthew.brett@gmail.com>wrote:
Hi,
On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett <matthew.brett@gmail.com
wrote:
Hi,
On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett <matthew.brett@gmail.com
wrote:
Hi,
On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers <ralf.gommers@googlemail.com> wrote: > > > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett > <matthew.brett@gmail.com> > wrote: >> >> Hi, >> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >> <ralf.gommers@googlemail.com> wrote: >> > >> > >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >> > <matthew.brett@gmail.com> >> > wrote: >> >> >> >> Hi, >> >> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >> >> <charlesr.harris@gmail.com> wrote: >> >> >> >> >> >> >> No, that's not what Nathaniel and I are saying at all.
Nathaniel
>> >> was >> >> pointing to links for projects that care that everyone agrees >> >> before >> >> they go ahead. >> > >> > It looked to me like there was a serious intent to come to an >> > agreement, >> > or >> > at least closer together. The discussion in the summer was going >> > around >> > in >> > circles though, and was too abstract and complex to follow. >> > Therefore >> > Mark's >> > choice of implementing something and then asking for feedback >> > made >> > sense >> > to >> > me. >> >> I should point out that the implementation hasn't - as far as I can >> see - changed the discussion. The discussion was about the API. >> >> Implementations are useful for agreed APIs because they can point >> out >> where the API does not make sense or cannot be implemented. In >> this >> case, the API Mark said he was going to implement - he did >> implement - >> at least as far as I can see. Again, I'm happy to be corrected. > > Implementations can also help the discussion along, by allowing > people > to > try out some of the proposed changes. It also allows to construct > examples > that show weaknesses, possibly to be solved by an alternative API. > Maybe > you > can hold the complete history of this topic in your head and > comprehend > it, > but for me it would be very helpful if someone said: > - here's my dataset > - this is what I want to do with it > - this is the best I can do with the current implementation > - here's how API X would allow me to solve this better or simpler > This can be done much better with actual data and an actual > implementation > than with a design proposal. You seem to disagree with this > statement. > That's fine. I would hope though that you recognize that concrete > examples > help people like me, and construct one or two to help us out. That's what use-cases are for in designing APIs. There are examples of use in the NEP:
https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst
the alterNEP:
https://gist.github.com/1056379
and my longer email to Travis:
http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignor...
Mark has done a nice job of documentation:
http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html
If you want to understand what the alterNEP case is, I'd suggest the email, just because it's the most recent and I think the terminology is slightly clearer.
Doing the same examples on a larger array won't make the point
easier
to understand. The discussion is about what the right concepts are, and you can help by looking at the snippets of code in those documents, and deciding for yourself whether you think the current masking / NA implementation seems natural and easy to explain, or rather forced and difficult to explain, and then email back trying to explain your impression (which is not always easy).
If you seriously believe that looking at a few snippets is as helpful and instructive as being able to play around with them in IPython and modify them, then I guess we won't make progress in this part of the discussion. You're just telling me to go back and re-read things I'd already read.
The snippets are in ipython or doctest format - aren't they?
Oops - 10 minute rule. Now I see that you mean that you can't experiment with the alternative implementation without working code.
Indeed.
That's true, but I am hoping that the difference between - say:
a[0:2] = np.NA
and
a.mask[0:2] = False
would be easy enough to imagine.
It is in this case. I agree the explicit ``a.mask`` is clearer. This is a quite specific point that could be improved in the current implementation.
Thanks - this is helpful.
It doesn't require ripping everything out.
Nathaniel wasn't proposing 'ripping everything out' - but backing off until consensus has been reached. That's different. If you think we should not do that, and you are interested, please say why. Second - I was proposing that we do indeed keep the code in the codebase but discuss adaptations that could achieve consensus.
I'm much opposed to ripping the current code out. It isn't like it is (known to be) buggy, nor has anyone made the case that it isn't a basis on which build other options. It also smacks of gratuitous violence committed by someone yet to make a positive contribution to the project. Chuck
Hi, On Sat, Oct 29, 2011 at 2:59 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Sat, Oct 29, 2011 at 3:55 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett <matthew.brett@gmail.com> wrote: > > Hi, > > On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers > <ralf.gommers@googlemail.com> wrote: > > > > > > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett > > <matthew.brett@gmail.com> > > wrote: > >> > >> Hi, > >> > >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers > >> <ralf.gommers@googlemail.com> wrote: > >> > > >> > > >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett > >> > <matthew.brett@gmail.com> > >> > wrote: > >> >> > >> >> Hi, > >> >> > >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris > >> >> <charlesr.harris@gmail.com> wrote: > >> >> >> > >> >> > >> >> No, that's not what Nathaniel and I are saying at all. > >> >> Nathaniel > >> >> was > >> >> pointing to links for projects that care that everyone agrees > >> >> before > >> >> they go ahead. > >> > > >> > It looked to me like there was a serious intent to come to an > >> > agreement, > >> > or > >> > at least closer together. The discussion in the summer was > >> > going > >> > around > >> > in > >> > circles though, and was too abstract and complex to follow. > >> > Therefore > >> > Mark's > >> > choice of implementing something and then asking for feedback > >> > made > >> > sense > >> > to > >> > me. > >> > >> I should point out that the implementation hasn't - as far as I > >> can > >> see - changed the discussion. The discussion was about the API. > >> > >> Implementations are useful for agreed APIs because they can > >> point > >> out > >> where the API does not make sense or cannot be implemented. In > >> this > >> case, the API Mark said he was going to implement - he did > >> implement - > >> at least as far as I can see. Again, I'm happy to be corrected. > > > > Implementations can also help the discussion along, by allowing > > people > > to > > try out some of the proposed changes. It also allows to construct > > examples > > that show weaknesses, possibly to be solved by an alternative > > API. > > Maybe > > you > > can hold the complete history of this topic in your head and > > comprehend > > it, > > but for me it would be very helpful if someone said: > > - here's my dataset > > - this is what I want to do with it > > - this is the best I can do with the current implementation > > - here's how API X would allow me to solve this better or simpler > > This can be done much better with actual data and an actual > > implementation > > than with a design proposal. You seem to disagree with this > > statement. > > That's fine. I would hope though that you recognize that concrete > > examples > > help people like me, and construct one or two to help us out. > That's what use-cases are for in designing APIs. There are > examples > of use in the NEP: > > > https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst > > the alterNEP: > > https://gist.github.com/1056379 > > and my longer email to Travis: > > > > > http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignor... > > Mark has done a nice job of documentation: > > http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html > > If you want to understand what the alterNEP case is, I'd suggest > the > email, just because it's the most recent and I think the > terminology > is slightly clearer. > > Doing the same examples on a larger array won't make the point > easier > to understand. The discussion is about what the right concepts > are, > and you can help by looking at the snippets of code in those > documents, and deciding for yourself whether you think the current > masking / NA implementation seems natural and easy to explain, or > rather forced and difficult to explain, and then email back trying > to > explain your impression (which is not always easy).
If you seriously believe that looking at a few snippets is as helpful and instructive as being able to play around with them in IPython and modify them, then I guess we won't make progress in this part of the discussion. You're just telling me to go back and re-read things I'd already read.
The snippets are in ipython or doctest format - aren't they?
Oops - 10 minute rule. Now I see that you mean that you can't experiment with the alternative implementation without working code.
Indeed.
That's true, but I am hoping that the difference between - say:
a[0:2] = np.NA
and
a.mask[0:2] = False
would be easy enough to imagine.
It is in this case. I agree the explicit ``a.mask`` is clearer. This is a quite specific point that could be improved in the current implementation.
Thanks - this is helpful.
It doesn't require ripping everything out.
Nathaniel wasn't proposing 'ripping everything out' - but backing off until consensus has been reached. That's different. If you think we should not do that, and you are interested, please say why. Second - I was proposing that we do indeed keep the code in the codebase but discuss adaptations that could achieve consensus.
I'm much opposed to ripping the current code out.
You are repeating the loaded phrase 'ripping the current code out' and thus making the discussion less sensible and more hostile.
It isn't like it is (known to be) buggy, nor has anyone made the case that it isn't a basis on which build other options. It also smacks of gratuitous violence committed by someone yet to make a positive contribution to the project.
This is cheap, rude, and silly. All I can see from Nathaniel is a reasonable, fair attempt to discuss the code. He proposed backing off the code in good faith. You are emphatically, and, in my view childishly, ignoring the substantial points he is making, and asserting over and over that he deserves no hearing because he has not contributed code. This is a terribly destructive way to work. If I was a new developer reading this, I would conclude, that I had better be damn careful which side I'm on, before I express my opinion, otherwise I'm going to be made to feel like I don't exist by the other people on the project. That is miserable, it is silly, and it's the wrong way to do business. Best, Matthew
On Sat, Oct 29, 2011 at 5:11 PM, Matthew Brett <matthew.brett@gmail.com>wrote:
Hi,
On Sat, Oct 29, 2011 at 2:59 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Sat, Oct 29, 2011 at 3:55 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote: > > > On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett > <matthew.brett@gmail.com> > wrote: >> >> Hi, >> >> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers >> <ralf.gommers@googlemail.com> wrote: >> > >> > >> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett >> > <matthew.brett@gmail.com> >> > wrote: >> >> >> >> Hi, >> >> >> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >> >> <ralf.gommers@googlemail.com> wrote: >> >> > >> >> > >> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >> >> > <matthew.brett@gmail.com> >> >> > wrote: >> >> >> >> >> >> Hi, >> >> >> >> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >> >> >> <charlesr.harris@gmail.com> wrote: >> >> >> >> >> >> >> >> >> >> No, that's not what Nathaniel and I are saying at all. >> >> >> Nathaniel >> >> >> was >> >> >> pointing to links for projects that care that everyone
>> >> >> before >> >> >> they go ahead. >> >> > >> >> > It looked to me like there was a serious intent to come to an >> >> > agreement, >> >> > or >> >> > at least closer together. The discussion in the summer was >> >> > going >> >> > around >> >> > in >> >> > circles though, and was too abstract and complex to follow. >> >> > Therefore >> >> > Mark's >> >> > choice of implementing something and then asking for feedback >> >> > made >> >> > sense >> >> > to >> >> > me. >> >> >> >> I should point out that the implementation hasn't - as far as I >> >> can >> >> see - changed the discussion. The discussion was about the API. >> >> >> >> Implementations are useful for agreed APIs because they can >> >> point >> >> out >> >> where the API does not make sense or cannot be implemented. In >> >> this >> >> case, the API Mark said he was going to implement - he did >> >> implement - >> >> at least as far as I can see. Again, I'm happy to be corrected. >> > >> > Implementations can also help the discussion along, by allowing >> > people >> > to >> > try out some of the proposed changes. It also allows to construct >> > examples >> > that show weaknesses, possibly to be solved by an alternative >> > API. >> > Maybe >> > you >> > can hold the complete history of this topic in your head and >> > comprehend >> > it, >> > but for me it would be very helpful if someone said: >> > - here's my dataset >> > - this is what I want to do with it >> > - this is the best I can do with the current implementation >> > - here's how API X would allow me to solve this better or simpler >> > This can be done much better with actual data and an actual >> > implementation >> > than with a design proposal. You seem to disagree with this >> > statement. >> > That's fine. I would hope though that you recognize that concrete >> > examples >> > help people like me, and construct one or two to help us out. >> That's what use-cases are for in designing APIs. There are >> examples >> of use in the NEP: >> >> >> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst >> >> the alterNEP: >> >> https://gist.github.com/1056379 >> >> and my longer email to Travis: >> >> >> >> >> http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignor... >> >> Mark has done a nice job of documentation: >> >> http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html >> >> If you want to understand what the alterNEP case is, I'd suggest >> the >> email, just because it's the most recent and I think the >> terminology >> is slightly clearer. >> >> Doing the same examples on a larger array won't make the point >> easier >> to understand. The discussion is about what the right concepts >> are, >> and you can help by looking at the snippets of code in those >> documents, and deciding for yourself whether you think the current >> masking / NA implementation seems natural and easy to explain, or >> rather forced and difficult to explain, and then email back
agrees trying
>> to >> explain your impression (which is not always easy). > > If you seriously believe that looking at a few snippets is as > helpful > and > instructive as being able to play around with them in IPython and > modify > them, then I guess we won't make progress in this part of the > discussion. > You're just telling me to go back and re-read things I'd already > read.
The snippets are in ipython or doctest format - aren't they?
Oops - 10 minute rule. Now I see that you mean that you can't experiment with the alternative implementation without working code.
Indeed.
That's true, but I am hoping that the difference between - say:
a[0:2] = np.NA
and
a.mask[0:2] = False
would be easy enough to imagine.
It is in this case. I agree the explicit ``a.mask`` is clearer. This is a quite specific point that could be improved in the current implementation.
Thanks - this is helpful.
It doesn't require ripping everything out.
Nathaniel wasn't proposing 'ripping everything out' - but backing off until consensus has been reached. That's different. If you think we should not do that, and you are interested, please say why. Second - I was proposing that we do indeed keep the code in the codebase but discuss adaptations that could achieve consensus.
I'm much opposed to ripping the current code out.
You are repeating the loaded phrase 'ripping the current code out' and thus making the discussion less sensible and more hostile.
It isn't like it is (known to be) buggy, nor has anyone made the case that it isn't a basis on which build other options. It also smacks of gratuitous violence committed by someone yet to make a positive contribution to the project.
This is cheap, rude, and silly. All I can see from Nathaniel is a reasonable, fair attempt to discuss the code. He proposed backing off the code in good faith. You are emphatically, and, in my view childishly, ignoring the substantial points he is making, and asserting over and over that he deserves no hearing because he has not contributed code. This is a terribly destructive way to work. If I was a new developer reading this, I would conclude, that I had better be damn careful which side I'm on, before I express my opinion, otherwise I'm going to be made to feel like I don't exist by the other people on the project. That is miserable, it is silly, and it's the wrong way to do business.
Best,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
To be honest, you have been slandering a lot, also in previous discussions, to get what you wanted. This is not a healthy way of discussion, nor does it help in any way. There have been many people willing to listen and agree with you on points; and this is exactly what discussion is all about, but where they might agree on some, they might disagree on others. When you start pulling the - people who won't listen to me are evil - card, it might have some effect the first time, but the second and third time they see what's coming.. o/
Hi, On Sat, Oct 29, 2011 at 4:28 PM, Han Genuit <hangenuit@gmail.com> wrote:
To be honest, you have been slandering a lot, also in previous discussions, to get what you wanted. This is not a healthy way of discussion, nor does it help in any way.
That's a severe accusation. Please quote something I said that was false, or unfair. See you, Matthew
On Sat, Oct 29, 2011 at 5:11 PM, Matthew Brett <matthew.brett@gmail.com>wrote:
Hi,
On Sat, Oct 29, 2011 at 2:59 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Sat, Oct 29, 2011 at 3:55 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote: > > > On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett > <matthew.brett@gmail.com> > wrote: >> >> Hi, >> >> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers >> <ralf.gommers@googlemail.com> wrote: >> > >> > >> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett >> > <matthew.brett@gmail.com> >> > wrote: >> >> >> >> Hi, >> >> >> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >> >> <ralf.gommers@googlemail.com> wrote: >> >> > >> >> > >> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >> >> > <matthew.brett@gmail.com> >> >> > wrote: >> >> >> >> >> >> Hi, >> >> >> >> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >> >> >> <charlesr.harris@gmail.com> wrote: >> >> >> >> >> >> >> >> >> >> No, that's not what Nathaniel and I are saying at all. >> >> >> Nathaniel >> >> >> was >> >> >> pointing to links for projects that care that everyone
>> >> >> before >> >> >> they go ahead. >> >> > >> >> > It looked to me like there was a serious intent to come to an >> >> > agreement, >> >> > or >> >> > at least closer together. The discussion in the summer was >> >> > going >> >> > around >> >> > in >> >> > circles though, and was too abstract and complex to follow. >> >> > Therefore >> >> > Mark's >> >> > choice of implementing something and then asking for feedback >> >> > made >> >> > sense >> >> > to >> >> > me. >> >> >> >> I should point out that the implementation hasn't - as far as I >> >> can >> >> see - changed the discussion. The discussion was about the API. >> >> >> >> Implementations are useful for agreed APIs because they can >> >> point >> >> out >> >> where the API does not make sense or cannot be implemented. In >> >> this >> >> case, the API Mark said he was going to implement - he did >> >> implement - >> >> at least as far as I can see. Again, I'm happy to be corrected. >> > >> > Implementations can also help the discussion along, by allowing >> > people >> > to >> > try out some of the proposed changes. It also allows to construct >> > examples >> > that show weaknesses, possibly to be solved by an alternative >> > API. >> > Maybe >> > you >> > can hold the complete history of this topic in your head and >> > comprehend >> > it, >> > but for me it would be very helpful if someone said: >> > - here's my dataset >> > - this is what I want to do with it >> > - this is the best I can do with the current implementation >> > - here's how API X would allow me to solve this better or simpler >> > This can be done much better with actual data and an actual >> > implementation >> > than with a design proposal. You seem to disagree with this >> > statement. >> > That's fine. I would hope though that you recognize that concrete >> > examples >> > help people like me, and construct one or two to help us out. >> That's what use-cases are for in designing APIs. There are >> examples >> of use in the NEP: >> >> >> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst >> >> the alterNEP: >> >> https://gist.github.com/1056379 >> >> and my longer email to Travis: >> >> >> >> >> http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignor... >> >> Mark has done a nice job of documentation: >> >> http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html >> >> If you want to understand what the alterNEP case is, I'd suggest >> the >> email, just because it's the most recent and I think the >> terminology >> is slightly clearer. >> >> Doing the same examples on a larger array won't make the point >> easier >> to understand. The discussion is about what the right concepts >> are, >> and you can help by looking at the snippets of code in those >> documents, and deciding for yourself whether you think the current >> masking / NA implementation seems natural and easy to explain, or >> rather forced and difficult to explain, and then email back
agrees trying
>> to >> explain your impression (which is not always easy). > > If you seriously believe that looking at a few snippets is as > helpful > and > instructive as being able to play around with them in IPython and > modify > them, then I guess we won't make progress in this part of the > discussion. > You're just telling me to go back and re-read things I'd already > read.
The snippets are in ipython or doctest format - aren't they?
Oops - 10 minute rule. Now I see that you mean that you can't experiment with the alternative implementation without working code.
Indeed.
That's true, but I am hoping that the difference between - say:
a[0:2] = np.NA
and
a.mask[0:2] = False
would be easy enough to imagine.
It is in this case. I agree the explicit ``a.mask`` is clearer. This is a quite specific point that could be improved in the current implementation.
Thanks - this is helpful.
It doesn't require ripping everything out.
Nathaniel wasn't proposing 'ripping everything out' - but backing off until consensus has been reached. That's different. If you think we should not do that, and you are interested, please say why. Second - I was proposing that we do indeed keep the code in the codebase but discuss adaptations that could achieve consensus.
I'm much opposed to ripping the current code out.
You are repeating the loaded phrase 'ripping the current code out' and thus making the discussion less sensible and more hostile.
It isn't like it is (known to be) buggy, nor has anyone made the case that it isn't a basis on which build other options. It also smacks of gratuitous violence committed by someone yet to make a positive contribution to the project.
This is cheap, rude, and silly. All I can see from Nathaniel is a reasonable, fair attempt to discuss the code. He proposed backing off the code in good faith. You are emphatically, and, in my view childishly, ignoring the substantial points he is making, and asserting over and over that he deserves no hearing because he has not contributed code.
Sorry Matthew, but Nathaniel's interaction comes across to me as arrogant, and your constant use of terms like childish, destructive to the community, etc. come across as manipulative. I can live with the words, but you aren't doing much to get this developer on your side. Chuck
Hi, On Sat, Oct 29, 2011 at 4:18 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Sat, Oct 29, 2011 at 5:11 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 2:59 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Sat, Oct 29, 2011 at 3:55 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett <matthew.brett@gmail.com> wrote: > Hi, > > On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers > <ralf.gommers@googlemail.com> wrote: >> >> >> On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett >> <matthew.brett@gmail.com> >> wrote: >>> >>> Hi, >>> >>> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers >>> <ralf.gommers@googlemail.com> wrote: >>> > >>> > >>> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett >>> > <matthew.brett@gmail.com> >>> > wrote: >>> >> >>> >> Hi, >>> >> >>> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >>> >> <ralf.gommers@googlemail.com> wrote: >>> >> > >>> >> > >>> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >>> >> > <matthew.brett@gmail.com> >>> >> > wrote: >>> >> >> >>> >> >> Hi, >>> >> >> >>> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >>> >> >> <charlesr.harris@gmail.com> wrote: >>> >> >> >> >>> >> >> >>> >> >> No, that's not what Nathaniel and I are saying at all. >>> >> >> Nathaniel >>> >> >> was >>> >> >> pointing to links for projects that care that everyone >>> >> >> agrees >>> >> >> before >>> >> >> they go ahead. >>> >> > >>> >> > It looked to me like there was a serious intent to come to >>> >> > an >>> >> > agreement, >>> >> > or >>> >> > at least closer together. The discussion in the summer was >>> >> > going >>> >> > around >>> >> > in >>> >> > circles though, and was too abstract and complex to follow. >>> >> > Therefore >>> >> > Mark's >>> >> > choice of implementing something and then asking for >>> >> > feedback >>> >> > made >>> >> > sense >>> >> > to >>> >> > me. >>> >> >>> >> I should point out that the implementation hasn't - as far as >>> >> I >>> >> can >>> >> see - changed the discussion. The discussion was about the >>> >> API. >>> >> >>> >> Implementations are useful for agreed APIs because they can >>> >> point >>> >> out >>> >> where the API does not make sense or cannot be implemented. >>> >> In >>> >> this >>> >> case, the API Mark said he was going to implement - he did >>> >> implement - >>> >> at least as far as I can see. Again, I'm happy to be >>> >> corrected. >>> > >>> > Implementations can also help the discussion along, by >>> > allowing >>> > people >>> > to >>> > try out some of the proposed changes. It also allows to >>> > construct >>> > examples >>> > that show weaknesses, possibly to be solved by an alternative >>> > API. >>> > Maybe >>> > you >>> > can hold the complete history of this topic in your head and >>> > comprehend >>> > it, >>> > but for me it would be very helpful if someone said: >>> > - here's my dataset >>> > - this is what I want to do with it >>> > - this is the best I can do with the current implementation >>> > - here's how API X would allow me to solve this better or >>> > simpler >>> > This can be done much better with actual data and an actual >>> > implementation >>> > than with a design proposal. You seem to disagree with this >>> > statement. >>> > That's fine. I would hope though that you recognize that >>> > concrete >>> > examples >>> > help people like me, and construct one or two to help us out. >>> That's what use-cases are for in designing APIs. There are >>> examples >>> of use in the NEP: >>> >>> >>> >>> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst >>> >>> the alterNEP: >>> >>> https://gist.github.com/1056379 >>> >>> and my longer email to Travis: >>> >>> >>> >>> >>> >>> http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignor... >>> >>> Mark has done a nice job of documentation: >>> >>> http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html >>> >>> If you want to understand what the alterNEP case is, I'd suggest >>> the >>> email, just because it's the most recent and I think the >>> terminology >>> is slightly clearer. >>> >>> Doing the same examples on a larger array won't make the point >>> easier >>> to understand. The discussion is about what the right concepts >>> are, >>> and you can help by looking at the snippets of code in those >>> documents, and deciding for yourself whether you think the >>> current >>> masking / NA implementation seems natural and easy to explain, >>> or >>> rather forced and difficult to explain, and then email back >>> trying >>> to >>> explain your impression (which is not always easy). >> >> If you seriously believe that looking at a few snippets is as >> helpful >> and >> instructive as being able to play around with them in IPython and >> modify >> them, then I guess we won't make progress in this part of the >> discussion. >> You're just telling me to go back and re-read things I'd already >> read. > > The snippets are in ipython or doctest format - aren't they?
Oops - 10 minute rule. Now I see that you mean that you can't experiment with the alternative implementation without working code.
Indeed.
That's true, but I am hoping that the difference between - say:
a[0:2] = np.NA
and
a.mask[0:2] = False
would be easy enough to imagine.
It is in this case. I agree the explicit ``a.mask`` is clearer. This is a quite specific point that could be improved in the current implementation.
Thanks - this is helpful.
It doesn't require ripping everything out.
Nathaniel wasn't proposing 'ripping everything out' - but backing off until consensus has been reached. That's different. If you think we should not do that, and you are interested, please say why. Second - I was proposing that we do indeed keep the code in the codebase but discuss adaptations that could achieve consensus.
I'm much opposed to ripping the current code out.
You are repeating the loaded phrase 'ripping the current code out' and thus making the discussion less sensible and more hostile.
It isn't like it is (known to be) buggy, nor has anyone made the case that it isn't a basis on which build other options. It also smacks of gratuitous violence committed by someone yet to make a positive contribution to the project.
This is cheap, rude, and silly. All I can see from Nathaniel is a reasonable, fair attempt to discuss the code. He proposed backing off the code in good faith. You are emphatically, and, in my view childishly, ignoring the substantial points he is making, and asserting over and over that he deserves no hearing because he has not contributed code.
Sorry Matthew, but Nathaniel's interaction comes across to me as arrogant, and your constant use of terms like childish, destructive to the community, etc. come across as manipulative.
I don't know what 'manipulative' means here. Can you explain?
I can live with the words, but you aren't doing much to get this developer on your side.
No, I am not trying to get you on my side because I don't believe in sides, and, unless you tell me otherwise, I think you believe in a implicit model of decision making that is bad for numpy. I will willingly and enthusiastically buy you a drink at scipy - but I believe you are wrong in the way that you have approached this discussion, and I believe the model that you are using, and it's opposite - are of central importance to the health of our shared discussions in the future. Best, Matthew
On Saturday, October 29, 2011, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 2:59 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Sat, Oct 29, 2011 at 3:55 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote: > > > On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett > <matthew.brett@gmail.com> > wrote: >> >> Hi, >> >> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers >> <ralf.gommers@googlemail.com> wrote: >> > >> > >> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett >> > <matthew.brett@gmail.com> >> > wrote: >> >> >> >> Hi, >> >> >> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >> >> <ralf.gommers@googlemail.com> wrote: >> >> > >> >> > >> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >> >> > <matthew.brett@gmail.com> >> >> > wrote: >> >> >> >> >> >> Hi, >> >> >> >> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >> >> >> <charlesr.harris@gmail.com> wrote: >> >> >> >> >> >> >> >> >> >> No, that's not what Nathaniel and I are saying at all. >> >> >> Nathaniel >> >> >> was >> >> >> pointing to links for projects that care that everyone
>> >> >> before >> >> >> they go ahead. >> >> > >> >> > It looked to me like there was a serious intent to come to an >> >> > agreement, >> >> > or >> >> > at least closer together. The discussion in the summer was >> >> > going >> >> > around >> >> > in >> >> > circles though, and was too abstract and complex to follow. You are repeating the loaded phrase 'ripping the current code out' and
agrees thus making the discussion less sensible and more hostile.
It isn't like it is (known to be) buggy, nor has anyone made the case that it isn't a basis on which build other options. It also smacks of gratuitous violence committed by someone yet to make a positive contribution to the project.
This is cheap, rude, and silly. All I can see from Nathaniel is a reasonable, fair attempt to discuss the code. He proposed backing off the code in good faith. You are emphatically, and, in my view childishly, ignoring the substantial points he is making, and asserting over and over that he deserves no hearing because he has not contributed code. This is a terribly destructive way to work. If I was a new developer reading this, I would conclude, that I had better be damn careful which side I'm on, before I express my opinion, otherwise I'm going to be made to feel like I don't exist by the other people on the project. That is miserable, it is silly, and it's the wrong way to do business.
Best,
Matthew
/me blows whistle. Personal foul against defense! Personal foul against offense! Penalties offset! Repeat first down. 10 minute rule, please. Ben Root P.S. - as a bit of evidence against the idea that chuck doesnt consider opinions from non-contributors, I haven't felt ignored during this whole discussion, yet I don't think that anyone had an expectation of me to produce code. However, to have an expectation to produce code for counter-proposals might be a bit unfair because the ones offering counter proposal may not have the resources available, like we did with mark.
Hi, On Sat, Oct 29, 2011 at 4:24 PM, Benjamin Root <ben.root@ou.edu> wrote:
On Saturday, October 29, 2011, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 2:59 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Sat, Oct 29, 2011 at 3:55 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett <matthew.brett@gmail.com> wrote: > Hi, > > On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers > <ralf.gommers@googlemail.com> wrote: >> >> >> On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett >> <matthew.brett@gmail.com> >> wrote: >>> >>> Hi, >>> >>> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers >>> <ralf.gommers@googlemail.com> wrote: >>> > >>> > >>> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett >>> > <matthew.brett@gmail.com> >>> > wrote: >>> >> >>> >> Hi, >>> >> >>> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >>> >> <ralf.gommers@googlemail.com> wrote: >>> >> > >>> >> > >>> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >>> >> > <matthew.brett@gmail.com> >>> >> > wrote: >>> >> >> >>> >> >> Hi, >>> >> >> >>> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >>> >> >> <charlesr.harris@gmail.com> wrote: >>> >> >> >> >>> >> >> >>> >> >> No, that's not what Nathaniel and I are saying at all. >>> >> >> Nathaniel >>> >> >> was >>> >> >> pointing to links for projects that care that everyone >>> >> >> agrees >>> >> >> before >>> >> >> they go ahead. >>> >> > >>> >> > It looked to me like there was a serious intent to come to >>> >> > an >>> >> > agreement, >>> >> > or >>> >> > at least closer together. The discussion in the summer was >>> >> > going >>> >> > around >>> >> > in >>> >> > circles though, and was too abstract and complex to follow.
You are repeating the loaded phrase 'ripping the current code out' and
thus making the discussion less sensible and more hostile.
It isn't like it is (known to be) buggy, nor has anyone made the case that it isn't a basis on which build other options. It also smacks of gratuitous violence committed by someone yet to make a positive contribution to the project.
This is cheap, rude, and silly. All I can see from Nathaniel is a reasonable, fair attempt to discuss the code. He proposed backing off the code in good faith. You are emphatically, and, in my view childishly, ignoring the substantial points he is making, and asserting over and over that he deserves no hearing because he has not contributed code. This is a terribly destructive way to work. If I was a new developer reading this, I would conclude, that I had better be damn careful which side I'm on, before I express my opinion, otherwise I'm going to be made to feel like I don't exist by the other people on the project. That is miserable, it is silly, and it's the wrong way to do business.
Best,
Matthew
/me blows whistle. Personal foul against defense! Personal foul against offense! Penalties offset! Repeat first down.
Is that right? I think I'm calling Charles on giving Nathaniel the silent treatment. Am I wrong to do that? Is that not true? See you, Matthew
Hi, On Sat, Oct 29, 2011 at 4:11 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 2:59 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Sat, Oct 29, 2011 at 3:55 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote: > > > On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett > <matthew.brett@gmail.com> > wrote: >> >> Hi, >> >> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers >> <ralf.gommers@googlemail.com> wrote: >> > >> > >> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett >> > <matthew.brett@gmail.com> >> > wrote: >> >> >> >> Hi, >> >> >> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >> >> <ralf.gommers@googlemail.com> wrote: >> >> > >> >> > >> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >> >> > <matthew.brett@gmail.com> >> >> > wrote: >> >> >> >> >> >> Hi, >> >> >> >> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >> >> >> <charlesr.harris@gmail.com> wrote: >> >> >> >> >> >> >> >> >> >> No, that's not what Nathaniel and I are saying at all. >> >> >> Nathaniel >> >> >> was >> >> >> pointing to links for projects that care that everyone agrees >> >> >> before >> >> >> they go ahead. >> >> > >> >> > It looked to me like there was a serious intent to come to an >> >> > agreement, >> >> > or >> >> > at least closer together. The discussion in the summer was >> >> > going >> >> > around >> >> > in >> >> > circles though, and was too abstract and complex to follow. >> >> > Therefore >> >> > Mark's >> >> > choice of implementing something and then asking for feedback >> >> > made >> >> > sense >> >> > to >> >> > me. >> >> >> >> I should point out that the implementation hasn't - as far as I >> >> can >> >> see - changed the discussion. The discussion was about the API. >> >> >> >> Implementations are useful for agreed APIs because they can >> >> point >> >> out >> >> where the API does not make sense or cannot be implemented. In >> >> this >> >> case, the API Mark said he was going to implement - he did >> >> implement - >> >> at least as far as I can see. Again, I'm happy to be corrected. >> > >> > Implementations can also help the discussion along, by allowing >> > people >> > to >> > try out some of the proposed changes. It also allows to construct >> > examples >> > that show weaknesses, possibly to be solved by an alternative >> > API. >> > Maybe >> > you >> > can hold the complete history of this topic in your head and >> > comprehend >> > it, >> > but for me it would be very helpful if someone said: >> > - here's my dataset >> > - this is what I want to do with it >> > - this is the best I can do with the current implementation >> > - here's how API X would allow me to solve this better or simpler >> > This can be done much better with actual data and an actual >> > implementation >> > than with a design proposal. You seem to disagree with this >> > statement. >> > That's fine. I would hope though that you recognize that concrete >> > examples >> > help people like me, and construct one or two to help us out. >> That's what use-cases are for in designing APIs. There are >> examples >> of use in the NEP: >> >> >> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst >> >> the alterNEP: >> >> https://gist.github.com/1056379 >> >> and my longer email to Travis: >> >> >> >> >> http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignor... >> >> Mark has done a nice job of documentation: >> >> http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html >> >> If you want to understand what the alterNEP case is, I'd suggest >> the >> email, just because it's the most recent and I think the >> terminology >> is slightly clearer. >> >> Doing the same examples on a larger array won't make the point >> easier >> to understand. The discussion is about what the right concepts >> are, >> and you can help by looking at the snippets of code in those >> documents, and deciding for yourself whether you think the current >> masking / NA implementation seems natural and easy to explain, or >> rather forced and difficult to explain, and then email back trying >> to >> explain your impression (which is not always easy). > > If you seriously believe that looking at a few snippets is as > helpful > and > instructive as being able to play around with them in IPython and > modify > them, then I guess we won't make progress in this part of the > discussion. > You're just telling me to go back and re-read things I'd already > read.
The snippets are in ipython or doctest format - aren't they?
Oops - 10 minute rule. Now I see that you mean that you can't experiment with the alternative implementation without working code.
Indeed.
That's true, but I am hoping that the difference between - say:
a[0:2] = np.NA
and
a.mask[0:2] = False
would be easy enough to imagine.
It is in this case. I agree the explicit ``a.mask`` is clearer. This is a quite specific point that could be improved in the current implementation.
Thanks - this is helpful.
It doesn't require ripping everything out.
Nathaniel wasn't proposing 'ripping everything out' - but backing off until consensus has been reached. That's different. If you think we should not do that, and you are interested, please say why. Second - I was proposing that we do indeed keep the code in the codebase but discuss adaptations that could achieve consensus.
I'm much opposed to ripping the current code out.
You are repeating the loaded phrase 'ripping the current code out' and thus making the discussion less sensible and more hostile.
It isn't like it is (known to be) buggy, nor has anyone made the case that it isn't a basis on which build other options. It also smacks of gratuitous violence committed by someone yet to make a positive contribution to the project.
This is cheap, rude, and silly. All I can see from Nathaniel is a reasonable, fair attempt to discuss the code. He proposed backing off the code in good faith. You are emphatically, and, in my view childishly, ignoring the substantial points he is making, and asserting over and over that he deserves no hearing because he has not contributed code. This is a terribly destructive way to work. If I was a new developer reading this, I would conclude, that I had better be damn careful which side I'm on, before I express my opinion, otherwise I'm going to be made to feel like I don't exist by the other people on the project. That is miserable, it is silly, and it's the wrong way to do business.
I conclude that it's bad to drink this much coffee in an afternoon, and that the next time I visit my friend's house, I'll take some decaf. Sorry Chuck - you're right - this was too personal. I do disagree with you, but I was rude here and I am sorry. I owe you an expensive drink, as per Ben's excellent suggestion. See you, Matthew
On Sat, Oct 29, 2011 at 7:47 PM, Matthew Brett <matthew.brett@gmail.com>wrote:
Hi,
Hi,
On Sat, Oct 29, 2011 at 2:59 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Sat, Oct 29, 2011 at 3:55 PM, Matthew Brett <matthew.brett@gmail.com
wrote:
Hi,
On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett <matthew.brett@gmail.com> wrote: > Hi, > > On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers > <ralf.gommers@googlemail.com> wrote: >> >> >> On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett >> <matthew.brett@gmail.com> >> wrote: >>> >>> Hi, >>> >>> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers >>> <ralf.gommers@googlemail.com> wrote: >>> > >>> > >>> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett >>> > <matthew.brett@gmail.com> >>> > wrote: >>> >> >>> >> Hi, >>> >> >>> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >>> >> <ralf.gommers@googlemail.com> wrote: >>> >> > >>> >> > >>> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >>> >> > <matthew.brett@gmail.com> >>> >> > wrote: >>> >> >> >>> >> >> Hi, >>> >> >> >>> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >>> >> >> <charlesr.harris@gmail.com> wrote: >>> >> >> >> >>> >> >> >>> >> >> No, that's not what Nathaniel and I are saying at all. >>> >> >> Nathaniel >>> >> >> was >>> >> >> pointing to links for projects that care that everyone
agrees
>>> >> >> before >>> >> >> they go ahead. >>> >> > >>> >> > It looked to me like there was a serious intent to come to an >>> >> > agreement, >>> >> > or >>> >> > at least closer together. The discussion in the summer was >>> >> > going >>> >> > around >>> >> > in >>> >> > circles though, and was too abstract and complex to follow. >>> >> > Therefore >>> >> > Mark's >>> >> > choice of implementing something and then asking for feedback >>> >> > made >>> >> > sense >>> >> > to >>> >> > me. >>> >> >>> >> I should point out that the implementation hasn't - as far as I >>> >> can >>> >> see - changed the discussion. The discussion was about the API. >>> >> >>> >> Implementations are useful for agreed APIs because they can >>> >> point >>> >> out >>> >> where the API does not make sense or cannot be implemented. In >>> >> this >>> >> case, the API Mark said he was going to implement - he did >>> >> implement - >>> >> at least as far as I can see. Again, I'm happy to be corrected. >>> > >>> > Implementations can also help the discussion along, by allowing >>> > people >>> > to >>> > try out some of the proposed changes. It also allows to construct >>> > examples >>> > that show weaknesses, possibly to be solved by an alternative >>> > API. >>> > Maybe >>> > you >>> > can hold the complete history of this topic in your head and >>> > comprehend >>> > it, >>> > but for me it would be very helpful if someone said: >>> > - here's my dataset >>> > - this is what I want to do with it >>> > - this is the best I can do with the current implementation >>> > - here's how API X would allow me to solve this better or simpler >>> > This can be done much better with actual data and an actual >>> > implementation >>> > than with a design proposal. You seem to disagree with this >>> > statement. >>> > That's fine. I would hope though that you recognize that concrete >>> > examples >>> > help people like me, and construct one or two to help us out. >>> That's what use-cases are for in designing APIs. There are >>> examples >>> of use in the NEP: >>> >>> >>> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst >>> >>> the alterNEP: >>> >>> https://gist.github.com/1056379 >>> >>> and my longer email to Travis: >>> >>> >>> >>> >>> http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignor... >>> >>> Mark has done a nice job of documentation: >>> >>> http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html >>> >>> If you want to understand what the alterNEP case is, I'd suggest >>> the >>> email, just because it's the most recent and I think the >>> terminology >>> is slightly clearer. >>> >>> Doing the same examples on a larger array won't make the point >>> easier >>> to understand. The discussion is about what the right concepts >>> are, >>> and you can help by looking at the snippets of code in those >>> documents, and deciding for yourself whether you think the current >>> masking / NA implementation seems natural and easy to explain, or >>> rather forced and difficult to explain, and then email back
On Sat, Oct 29, 2011 at 4:11 PM, Matthew Brett <matthew.brett@gmail.com> wrote: trying
>>> to >>> explain your impression (which is not always easy). >> >> If you seriously believe that looking at a few snippets is as >> helpful >> and >> instructive as being able to play around with them in IPython and >> modify >> them, then I guess we won't make progress in this part of the >> discussion. >> You're just telling me to go back and re-read things I'd already >> read. > > The snippets are in ipython or doctest format - aren't they?
Oops - 10 minute rule. Now I see that you mean that you can't experiment with the alternative implementation without working code.
Indeed.
That's true, but I am hoping that the difference between - say:
a[0:2] = np.NA
and
a.mask[0:2] = False
would be easy enough to imagine.
It is in this case. I agree the explicit ``a.mask`` is clearer. This is a quite specific point that could be improved in the current implementation.
Thanks - this is helpful.
It doesn't require ripping everything out.
Nathaniel wasn't proposing 'ripping everything out' - but backing off until consensus has been reached. That's different. If you think we should not do that, and you are interested, please say why. Second - I was proposing that we do indeed keep the code in the codebase but discuss adaptations that could achieve consensus.
I'm much opposed to ripping the current code out.
You are repeating the loaded phrase 'ripping the current code out' and thus making the discussion less sensible and more hostile.
It isn't like it is (known to be) buggy, nor has anyone made the case that it isn't a basis on which build other options. It also smacks of gratuitous violence committed by someone yet to make a positive contribution to the project.
This is cheap, rude, and silly. All I can see from Nathaniel is a reasonable, fair attempt to discuss the code. He proposed backing off the code in good faith. You are emphatically, and, in my view childishly, ignoring the substantial points he is making, and asserting over and over that he deserves no hearing because he has not contributed code. This is a terribly destructive way to work. If I was a new developer reading this, I would conclude, that I had better be damn careful which side I'm on, before I express my opinion, otherwise I'm going to be made to feel like I don't exist by the other people on the project. That is miserable, it is silly, and it's the wrong way to do business.
I conclude that it's bad to drink this much coffee in an afternoon, and that the next time I visit my friend's house, I'll take some decaf.
Sorry Chuck - you're right - this was too personal. I do disagree with you, but I was rude here and I am sorry. I owe you an expensive drink, as per Ben's excellent suggestion.
Apology accepted. Let me add an argument for not pulling out the current implementation, which is the underlying reason of the release early, release often open software mantra: if the NA work is off in a branch, no one will use it and we will lack useful feedback. Now, I don't have a problem with adding a comment to the release notes stating that the API is not completely settled and can change due to user feedback. But we do need users, and they need to work with it for at least a few weeks. My own initial reaction to new software often evolves as: "WTF", followed by hours -- days -- weeks -- while I wander around muttering "morons, idiots" to myself. That is not the best period of time for me to make a balanced assessment, that needs to wait until I settle down. Then I adapt and usually things no longer look so bad, maybe they even look good, maybe even great. So it goes. Chuck
Hi, On Sat, Oct 29, 2011 at 7:48 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Sat, Oct 29, 2011 at 7:47 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 4:11 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 2:59 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Sat, Oct 29, 2011 at 3:55 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett <matthew.brett@gmail.com> wrote: > > Hi, > > On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett > <matthew.brett@gmail.com> > wrote: > > Hi, > > > > On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers > > <ralf.gommers@googlemail.com> wrote: > >> > >> > >> On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett > >> <matthew.brett@gmail.com> > >> wrote: > >>> > >>> Hi, > >>> > >>> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers > >>> <ralf.gommers@googlemail.com> wrote: > >>> > > >>> > > >>> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett > >>> > <matthew.brett@gmail.com> > >>> > wrote: > >>> >> > >>> >> Hi, > >>> >> > >>> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers > >>> >> <ralf.gommers@googlemail.com> wrote: > >>> >> > > >>> >> > > >>> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett > >>> >> > <matthew.brett@gmail.com> > >>> >> > wrote: > >>> >> >> > >>> >> >> Hi, > >>> >> >> > >>> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris > >>> >> >> <charlesr.harris@gmail.com> wrote: > >>> >> >> >> > >>> >> >> > >>> >> >> No, that's not what Nathaniel and I are saying at all. > >>> >> >> Nathaniel > >>> >> >> was > >>> >> >> pointing to links for projects that care that everyone > >>> >> >> agrees > >>> >> >> before > >>> >> >> they go ahead. > >>> >> > > >>> >> > It looked to me like there was a serious intent to come to > >>> >> > an > >>> >> > agreement, > >>> >> > or > >>> >> > at least closer together. The discussion in the summer was > >>> >> > going > >>> >> > around > >>> >> > in > >>> >> > circles though, and was too abstract and complex to > >>> >> > follow. > >>> >> > Therefore > >>> >> > Mark's > >>> >> > choice of implementing something and then asking for > >>> >> > feedback > >>> >> > made > >>> >> > sense > >>> >> > to > >>> >> > me. > >>> >> > >>> >> I should point out that the implementation hasn't - as far > >>> >> as I > >>> >> can > >>> >> see - changed the discussion. The discussion was about the > >>> >> API. > >>> >> > >>> >> Implementations are useful for agreed APIs because they can > >>> >> point > >>> >> out > >>> >> where the API does not make sense or cannot be implemented. > >>> >> In > >>> >> this > >>> >> case, the API Mark said he was going to implement - he did > >>> >> implement - > >>> >> at least as far as I can see. Again, I'm happy to be > >>> >> corrected. > >>> > > >>> > Implementations can also help the discussion along, by > >>> > allowing > >>> > people > >>> > to > >>> > try out some of the proposed changes. It also allows to > >>> > construct > >>> > examples > >>> > that show weaknesses, possibly to be solved by an alternative > >>> > API. > >>> > Maybe > >>> > you > >>> > can hold the complete history of this topic in your head and > >>> > comprehend > >>> > it, > >>> > but for me it would be very helpful if someone said: > >>> > - here's my dataset > >>> > - this is what I want to do with it > >>> > - this is the best I can do with the current implementation > >>> > - here's how API X would allow me to solve this better or > >>> > simpler > >>> > This can be done much better with actual data and an actual > >>> > implementation > >>> > than with a design proposal. You seem to disagree with this > >>> > statement. > >>> > That's fine. I would hope though that you recognize that > >>> > concrete > >>> > examples > >>> > help people like me, and construct one or two to help us out. > >>> That's what use-cases are for in designing APIs. There are > >>> examples > >>> of use in the NEP: > >>> > >>> > >>> > >>> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst > >>> > >>> the alterNEP: > >>> > >>> https://gist.github.com/1056379 > >>> > >>> and my longer email to Travis: > >>> > >>> > >>> > >>> > >>> > >>> http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignor... > >>> > >>> Mark has done a nice job of documentation: > >>> > >>> http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html > >>> > >>> If you want to understand what the alterNEP case is, I'd > >>> suggest > >>> the > >>> email, just because it's the most recent and I think the > >>> terminology > >>> is slightly clearer. > >>> > >>> Doing the same examples on a larger array won't make the point > >>> easier > >>> to understand. The discussion is about what the right concepts > >>> are, > >>> and you can help by looking at the snippets of code in those > >>> documents, and deciding for yourself whether you think the > >>> current > >>> masking / NA implementation seems natural and easy to explain, > >>> or > >>> rather forced and difficult to explain, and then email back > >>> trying > >>> to > >>> explain your impression (which is not always easy). > >> > >> If you seriously believe that looking at a few snippets is as > >> helpful > >> and > >> instructive as being able to play around with them in IPython > >> and > >> modify > >> them, then I guess we won't make progress in this part of the > >> discussion. > >> You're just telling me to go back and re-read things I'd already > >> read. > > > > The snippets are in ipython or doctest format - aren't they? > > Oops - 10 minute rule. Now I see that you mean that you can't > experiment with the alternative implementation without working > code.
Indeed.
> > That's true, but I am hoping that the difference between - say: > > a[0:2] = np.NA > > and > > a.mask[0:2] = False > > would be easy enough to imagine.
It is in this case. I agree the explicit ``a.mask`` is clearer. This is a quite specific point that could be improved in the current implementation.
Thanks - this is helpful.
It doesn't require ripping everything out.
Nathaniel wasn't proposing 'ripping everything out' - but backing off until consensus has been reached. That's different. If you think we should not do that, and you are interested, please say why. Second - I was proposing that we do indeed keep the code in the codebase but discuss adaptations that could achieve consensus.
I'm much opposed to ripping the current code out.
You are repeating the loaded phrase 'ripping the current code out' and thus making the discussion less sensible and more hostile.
It isn't like it is (known to be) buggy, nor has anyone made the case that it isn't a basis on which build other options. It also smacks of gratuitous violence committed by someone yet to make a positive contribution to the project.
This is cheap, rude, and silly. All I can see from Nathaniel is a reasonable, fair attempt to discuss the code. He proposed backing off the code in good faith. You are emphatically, and, in my view childishly, ignoring the substantial points he is making, and asserting over and over that he deserves no hearing because he has not contributed code. This is a terribly destructive way to work. If I was a new developer reading this, I would conclude, that I had better be damn careful which side I'm on, before I express my opinion, otherwise I'm going to be made to feel like I don't exist by the other people on the project. That is miserable, it is silly, and it's the wrong way to do business.
I conclude that it's bad to drink this much coffee in an afternoon, and that the next time I visit my friend's house, I'll take some decaf.
Sorry Chuck - you're right - this was too personal. I do disagree with you, but I was rude here and I am sorry. I owe you an expensive drink, as per Ben's excellent suggestion.
Apology accepted.
Thank you, that is gracious of you.
Let me add an argument for not pulling out the current implementation, which is the underlying reason of the release early, release often open software mantra: if the NA work is off in a branch, no one will use it and we will lack useful feedback. Now, I don't have a problem with adding a comment to the release notes stating that the API is not completely settled and can change due to user feedback. But we do need users, and they need to work with it for at least a few weeks. My own initial reaction to new software often evolves as: "WTF", followed by hours -- days -- weeks -- while I wander around muttering "morons, idiots" to myself. That is not the best period of time for me to make a balanced assessment, that needs to wait until I settle down. Then I adapt and usually things no longer look so bad, maybe they even look good, maybe even great. So it goes.
Yes, that's very reasonable. It may be that we don't have good hope of resolving the current discussion in the near future, in which case it would not make much sense to pull it out pending agreement. Best (honestly), Matthew
On 10/29/11 2:59 PM, Charles R Harris wrote:
I'm much opposed to ripping the current code out. It isn't like it is (known to be) buggy, nor has anyone made the case that it isn't a basis on which build other options. It also smacks of gratuitous violence committed by someone yet to make a positive contribution to the project.
1) contributing to the discussion IS a positive contribution to the project. 2) If we use the term "ripping out" it does "smacks of gratuitous violence" -- if we use the term "roll back", maybe not so much -- it's not like the code couldn't be put back in. That being said, I like the idea of it being easy and accessible for not-very-familiar-with-git folks to test -- so I'd like to see it left there for now at least. On 10/29/11 3:47 PM, Eric Firing wrote:
Similarly, in Marks implementation, 7 bits are available for a payload to describe what kind of masking is meant. This seems more consistent with True as masked (or NA) than with False as masked.
+1 -- we've got 8 bits, nice to be able to use them On 10/29/11 3:57 PM, Charles R Harris wrote:
I wouldn't rely on the 7 bits yet. Mark left them available to keep open possible future use, but didn't implement anything using them yet. If memory use turns out to exclude whole sectors of application we will have to go to bit masks.
would there have to be only one type of mask available? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Hi, On Sun, Oct 30, 2011 at 11:37 AM, Chris Barker <Chris.Barker@noaa.gov> wrote:
On 10/29/11 2:59 PM, Charles R Harris wrote:
I'm much opposed to ripping the current code out. It isn't like it is (known to be) buggy, nor has anyone made the case that it isn't a basis on which build other options. It also smacks of gratuitous violence committed by someone yet to make a positive contribution to the project.
1) contributing to the discussion IS a positive contribution to the project.
Yes, but, personally I'd rather the discussion was not about who was saying something, but what they were saying. That is, if someone proposes something, or offers a discussion, we don't first ask 'who are you', but try and engage with the substance of the argument. Best, Matthew
On Sat, Oct 29, 2011 at 11:55 PM, Matthew Brett <matthew.brett@gmail.com>wrote:
Hi,
On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett <matthew.brett@gmail.com
wrote:
Hi,
On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett <matthew.brett@gmail.com
wrote:
Hi,
On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers <ralf.gommers@googlemail.com> wrote: > > > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett > <matthew.brett@gmail.com> > wrote: >> >> Hi, >> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >> <ralf.gommers@googlemail.com> wrote: >> > >> > >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >> > <matthew.brett@gmail.com> >> > wrote: >> >> >> >> Hi, >> >> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >> >> <charlesr.harris@gmail.com> wrote: >> >> >> >> >> >> >> No, that's not what Nathaniel and I are saying at all.
Nathaniel
>> >> was >> >> pointing to links for projects that care that everyone agrees >> >> before >> >> they go ahead. >> > >> > It looked to me like there was a serious intent to come to an >> > agreement, >> > or >> > at least closer together. The discussion in the summer was going >> > around >> > in >> > circles though, and was too abstract and complex to follow. >> > Therefore >> > Mark's >> > choice of implementing something and then asking for feedback >> > made >> > sense >> > to >> > me. >> >> I should point out that the implementation hasn't - as far as I can >> see - changed the discussion. The discussion was about the API. >> >> Implementations are useful for agreed APIs because they can point >> out >> where the API does not make sense or cannot be implemented. In >> this >> case, the API Mark said he was going to implement - he did >> implement - >> at least as far as I can see. Again, I'm happy to be corrected. > > Implementations can also help the discussion along, by allowing > people > to > try out some of the proposed changes. It also allows to construct > examples > that show weaknesses, possibly to be solved by an alternative API. > Maybe > you > can hold the complete history of this topic in your head and > comprehend > it, > but for me it would be very helpful if someone said: > - here's my dataset > - this is what I want to do with it > - this is the best I can do with the current implementation > - here's how API X would allow me to solve this better or simpler > This can be done much better with actual data and an actual > implementation > than with a design proposal. You seem to disagree with this > statement. > That's fine. I would hope though that you recognize that concrete > examples > help people like me, and construct one or two to help us out. That's what use-cases are for in designing APIs. There are examples of use in the NEP:
https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst
the alterNEP:
https://gist.github.com/1056379
and my longer email to Travis:
http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignor...
Mark has done a nice job of documentation:
http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html
If you want to understand what the alterNEP case is, I'd suggest the email, just because it's the most recent and I think the terminology is slightly clearer.
Doing the same examples on a larger array won't make the point
easier
to understand. The discussion is about what the right concepts are, and you can help by looking at the snippets of code in those documents, and deciding for yourself whether you think the current masking / NA implementation seems natural and easy to explain, or rather forced and difficult to explain, and then email back trying to explain your impression (which is not always easy).
If you seriously believe that looking at a few snippets is as helpful and instructive as being able to play around with them in IPython and modify them, then I guess we won't make progress in this part of the discussion. You're just telling me to go back and re-read things I'd already read.
The snippets are in ipython or doctest format - aren't they?
Oops - 10 minute rule. Now I see that you mean that you can't experiment with the alternative implementation without working code.
Indeed.
That's true, but I am hoping that the difference between - say:
a[0:2] = np.NA
and
a.mask[0:2] = False
would be easy enough to imagine.
It is in this case. I agree the explicit ``a.mask`` is clearer. This is a quite specific point that could be improved in the current implementation.
Thanks - this is helpful.
So was your example.
It doesn't require ripping everything out.
Nathaniel wasn't proposing 'ripping everything out' - but backing off until consensus has been reached. That's different.
I'm worried that in practice it won't be different. If you put such a large amount of code in a branch, with no one lined up to work on changing/improving/re-integrating it, the most likely thing to happen is that it will just sit there in a branch, bitrot and eventually be lost.
If you think we should not do that, and you are interested, please say why. Second - I was proposing that we do indeed keep the code in the codebase but discuss adaptations that could achieve consensus.
Glad to hear it. This is not what I understood from the email you linked to earlier. Quoting: "Honestly, I think that NA should be a synonym for ABSENT, and so should be removed until the dust has settled, and restored as (np.NA == np.ABSENT)". At this point I care much more about having a good implementation than exactly which one; the similarities are much more important the differences. My main worry is we end up with nothing. As for the current situation and way forward, Eric Firing provided a much better summary and list of important points than I managed to communicate so far. I agree with everything he said. Cheers, Ralf
Hi, On Sun, Oct 30, 2011 at 12:24 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 11:55 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett <matthew.brett@gmail.com> wrote: > > Hi, > > On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers > <ralf.gommers@googlemail.com> wrote: > > > > > > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett > > <matthew.brett@gmail.com> > > wrote: > >> > >> Hi, > >> > >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers > >> <ralf.gommers@googlemail.com> wrote: > >> > > >> > > >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett > >> > <matthew.brett@gmail.com> > >> > wrote: > >> >> > >> >> Hi, > >> >> > >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris > >> >> <charlesr.harris@gmail.com> wrote: > >> >> >> > >> >> > >> >> No, that's not what Nathaniel and I are saying at all. > >> >> Nathaniel > >> >> was > >> >> pointing to links for projects that care that everyone agrees > >> >> before > >> >> they go ahead. > >> > > >> > It looked to me like there was a serious intent to come to an > >> > agreement, > >> > or > >> > at least closer together. The discussion in the summer was > >> > going > >> > around > >> > in > >> > circles though, and was too abstract and complex to follow. > >> > Therefore > >> > Mark's > >> > choice of implementing something and then asking for feedback > >> > made > >> > sense > >> > to > >> > me. > >> > >> I should point out that the implementation hasn't - as far as I > >> can > >> see - changed the discussion. The discussion was about the API. > >> > >> Implementations are useful for agreed APIs because they can > >> point > >> out > >> where the API does not make sense or cannot be implemented. In > >> this > >> case, the API Mark said he was going to implement - he did > >> implement - > >> at least as far as I can see. Again, I'm happy to be corrected. > > > > Implementations can also help the discussion along, by allowing > > people > > to > > try out some of the proposed changes. It also allows to construct > > examples > > that show weaknesses, possibly to be solved by an alternative > > API. > > Maybe > > you > > can hold the complete history of this topic in your head and > > comprehend > > it, > > but for me it would be very helpful if someone said: > > - here's my dataset > > - this is what I want to do with it > > - this is the best I can do with the current implementation > > - here's how API X would allow me to solve this better or simpler > > This can be done much better with actual data and an actual > > implementation > > than with a design proposal. You seem to disagree with this > > statement. > > That's fine. I would hope though that you recognize that concrete > > examples > > help people like me, and construct one or two to help us out. > That's what use-cases are for in designing APIs. There are > examples > of use in the NEP: > > > https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst > > the alterNEP: > > https://gist.github.com/1056379 > > and my longer email to Travis: > > > > > http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignor... > > Mark has done a nice job of documentation: > > http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html > > If you want to understand what the alterNEP case is, I'd suggest > the > email, just because it's the most recent and I think the > terminology > is slightly clearer. > > Doing the same examples on a larger array won't make the point > easier > to understand. The discussion is about what the right concepts > are, > and you can help by looking at the snippets of code in those > documents, and deciding for yourself whether you think the current > masking / NA implementation seems natural and easy to explain, or > rather forced and difficult to explain, and then email back trying > to > explain your impression (which is not always easy).
If you seriously believe that looking at a few snippets is as helpful and instructive as being able to play around with them in IPython and modify them, then I guess we won't make progress in this part of the discussion. You're just telling me to go back and re-read things I'd already read.
The snippets are in ipython or doctest format - aren't they?
Oops - 10 minute rule. Now I see that you mean that you can't experiment with the alternative implementation without working code.
Indeed.
That's true, but I am hoping that the difference between - say:
a[0:2] = np.NA
and
a.mask[0:2] = False
would be easy enough to imagine.
It is in this case. I agree the explicit ``a.mask`` is clearer. This is a quite specific point that could be improved in the current implementation.
Thanks - this is helpful.
So was your example.
It doesn't require ripping everything out.
Nathaniel wasn't proposing 'ripping everything out' - but backing off until consensus has been reached. That's different.
I'm worried that in practice it won't be different. If you put such a large amount of code in a branch, with no one lined up to work on changing/improving/re-integrating it, the most likely thing to happen is that it will just sit there in a branch, bitrot and eventually be lost.
If you think we should not do that, and you are interested, please say why. Second - I was proposing that we do indeed keep the code in the codebase but discuss adaptations that could achieve consensus.
Glad to hear it. This is not what I understood from the email you linked to earlier. Quoting: "Honestly, I think that NA should be a synonym for ABSENT, and so should be removed until the dust has settled, and restored as (np.NA == np.ABSENT)".
I was proposing that the name 'np.NA' should be removed, leaving np.IGNORED (with the same meaning as the current np.NA) and np.ABSENT currently not implemented. When it does get implemented, then, in due course, make np.NA a synonym for np.ABSENT. I'm sorry that wasn't obvious.
At this point I care much more about having a good implementation than exactly which one; the similarities are much more important the differences. My main worry is we end up with nothing.
I don't think any proposed route ended up with nothing. Nathaniel was only suggesting backing off until we had done the work of agreeing. It doesn't look like that has much support; that's fine. Best, Matthew
2011/10/29 Ralf Gommers <ralf.gommers@googlemail.com>
On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett <matthew.brett@gmail.com>wrote:
Hi,
Hi,
On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett <
matthew.brett@gmail.com>
wrote:
Hi,
On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett <
matthew.brett@gmail.com>
wrote: > > Hi, > > On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers > <ralf.gommers@googlemail.com> wrote: > > > > > > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett > > <matthew.brett@gmail.com> > > wrote: > >> > >> Hi, > >> > >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris > >> <charlesr.harris@gmail.com> wrote: > >> >> > >> > >> No, that's not what Nathaniel and I are saying at all. Nathaniel was > >> pointing to links for projects that care that everyone agrees before > >> they go ahead. > > > > It looked to me like there was a serious intent to come to an > > agreement, > > or > > at least closer together. The discussion in the summer was going > > around > > in > > circles though, and was too abstract and complex to follow. Therefore > > Mark's > > choice of implementing something and then asking for feedback made > > sense > > to > > me. > > I should point out that the implementation hasn't - as far as I can > see - changed the discussion. The discussion was about the API. > > Implementations are useful for agreed APIs because they can point out > where the API does not make sense or cannot be implemented. In
> case, the API Mark said he was going to implement - he did implement - > at least as far as I can see. Again, I'm happy to be corrected.
Implementations can also help the discussion along, by allowing
On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett <matthew.brett@gmail.com> wrote: this people
to try out some of the proposed changes. It also allows to construct examples that show weaknesses, possibly to be solved by an alternative API. Maybe you can hold the complete history of this topic in your head and comprehend it, but for me it would be very helpful if someone said: - here's my dataset - this is what I want to do with it - this is the best I can do with the current implementation - here's how API X would allow me to solve this better or simpler This can be done much better with actual data and an actual implementation than with a design proposal. You seem to disagree with this statement. That's fine. I would hope though that you recognize that concrete examples help people like me, and construct one or two to help us out. That's what use-cases are for in designing APIs. There are examples of use in the NEP:
https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst
the alterNEP:
https://gist.github.com/1056379
and my longer email to Travis:
http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignor...
Mark has done a nice job of documentation:
http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html
If you want to understand what the alterNEP case is, I'd suggest the email, just because it's the most recent and I think the terminology is slightly clearer.
Doing the same examples on a larger array won't make the point easier to understand. The discussion is about what the right concepts are, and you can help by looking at the snippets of code in those documents, and deciding for yourself whether you think the current masking / NA implementation seems natural and easy to explain, or rather forced and difficult to explain, and then email back trying to explain your impression (which is not always easy).
If you seriously believe that looking at a few snippets is as helpful and instructive as being able to play around with them in IPython and modify them, then I guess we won't make progress in this part of the discussion. You're just telling me to go back and re-read things I'd already read.
The snippets are in ipython or doctest format - aren't they?
Oops - 10 minute rule. Now I see that you mean that you can't experiment with the alternative implementation without working code.
Indeed.
That's true, but I am hoping that the difference between - say:
a[0:2] = np.NA
and
a.mask[0:2] = False
would be easy enough to imagine.
It is in this case. I agree the explicit ``a.mask`` is clearer. This is a quite specific point that could be improved in the current implementation. It doesn't require ripping everything out.
Ralf
I haven't been following the discussion closely, but wouldn't it be instead: a.mask[0:2] = True? It's something that I actually find a bit difficult to get right in the current numpy.ma implementation: I would find more intuitive to have True for "valid" data, and False for invalid / missing / ... I realize how the implementation makes sense (and is appropriate given that the name is "mask"), but I just thought I'd point this out... even if it's just me ;) -=- Olivier
On Sat, Oct 29, 2011 at 4:02 PM, Olivier Delalleau <shish@keba.be> wrote:
2011/10/29 Ralf Gommers <ralf.gommers@googlemail.com>
On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett <matthew.brett@gmail.com>wrote:
Hi,
Hi,
On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett <
matthew.brett@gmail.com>
wrote:
Hi,
On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers <ralf.gommers@googlemail.com> wrote: > > > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett <
matthew.brett@gmail.com>
> wrote: >> >> Hi, >> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >> <ralf.gommers@googlemail.com> wrote: >> > >> > >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >> > <matthew.brett@gmail.com> >> > wrote: >> >> >> >> Hi, >> >> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >> >> <charlesr.harris@gmail.com> wrote: >> >> >> >> >> >> >> No, that's not what Nathaniel and I are saying at all. Nathaniel was >> >> pointing to links for projects that care that everyone agrees before >> >> they go ahead. >> > >> > It looked to me like there was a serious intent to come to an >> > agreement, >> > or >> > at least closer together. The discussion in the summer was going >> > around >> > in >> > circles though, and was too abstract and complex to follow. Therefore >> > Mark's >> > choice of implementing something and then asking for feedback made >> > sense >> > to >> > me. >> >> I should point out that the implementation hasn't - as far as I can >> see - changed the discussion. The discussion was about the API. >> >> Implementations are useful for agreed APIs because they can point out >> where the API does not make sense or cannot be implemented. In
>> case, the API Mark said he was going to implement - he did implement - >> at least as far as I can see. Again, I'm happy to be corrected. > > Implementations can also help the discussion along, by allowing
On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett <matthew.brett@gmail.com> wrote: this people
> to > try out some of the proposed changes. It also allows to construct > examples > that show weaknesses, possibly to be solved by an alternative API. Maybe > you > can hold the complete history of this topic in your head and comprehend > it, > but for me it would be very helpful if someone said: > - here's my dataset > - this is what I want to do with it > - this is the best I can do with the current implementation > - here's how API X would allow me to solve this better or simpler > This can be done much better with actual data and an actual > implementation > than with a design proposal. You seem to disagree with this statement. > That's fine. I would hope though that you recognize that concrete > examples > help people like me, and construct one or two to help us out. That's what use-cases are for in designing APIs. There are examples of use in the NEP:
https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst
the alterNEP:
https://gist.github.com/1056379
and my longer email to Travis:
http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignor...
Mark has done a nice job of documentation:
http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html
If you want to understand what the alterNEP case is, I'd suggest the email, just because it's the most recent and I think the terminology is slightly clearer.
Doing the same examples on a larger array won't make the point easier to understand. The discussion is about what the right concepts are, and you can help by looking at the snippets of code in those documents, and deciding for yourself whether you think the current masking / NA implementation seems natural and easy to explain, or rather forced and difficult to explain, and then email back trying to explain your impression (which is not always easy).
If you seriously believe that looking at a few snippets is as helpful and instructive as being able to play around with them in IPython and modify them, then I guess we won't make progress in this part of the discussion. You're just telling me to go back and re-read things I'd already read.
The snippets are in ipython or doctest format - aren't they?
Oops - 10 minute rule. Now I see that you mean that you can't experiment with the alternative implementation without working code.
Indeed.
That's true, but I am hoping that the difference between - say:
a[0:2] = np.NA
and
a.mask[0:2] = False
would be easy enough to imagine.
It is in this case. I agree the explicit ``a.mask`` is clearer. This is a quite specific point that could be improved in the current implementation. It doesn't require ripping everything out.
Ralf
I haven't been following the discussion closely, but wouldn't it be instead: a.mask[0:2] = True?
It's something that I actually find a bit difficult to get right in the current numpy.ma implementation: I would find more intuitive to have True for "valid" data, and False for invalid / missing / ... I realize how the implementation makes sense (and is appropriate given that the name is "mask"), but I just thought I'd point this out... even if it's just me ;)
Well, there is the problem of replacing an unknown value by a known value, and then you would have to clear the mask also. However, I do appreciate this sort of feedback from actual use. We need more in order to see what are real sticking points and to separate the usual frustrations of learning new stuff from the more serious problem of inadequate API. If enough people start giving feedback we might want to set up some way to track it. Chuck
On 10/29/2011 12:02 PM, Olivier Delalleau wrote:
I haven't been following the discussion closely, but wouldn't it be instead: a.mask[0:2] = True?
That would be consistent with numpy.ma and the opposite of Mark's implementation. I can live with either, but I much prefer the numpy.ma version because it fits with the use of bit-flags for editing data; set bit 1 if it fails check A, set bit 2 if it fails check B, etc. So, if it evaluates as True, there is a problem, and the value is masked *out*. Similarly, in Marks implementation, 7 bits are available for a payload to describe what kind of masking is meant. This seems more consistent with True as masked (or NA) than with False as masked. Eric
It's something that I actually find a bit difficult to get right in the current numpy.ma <http://numpy.ma> implementation: I would find more intuitive to have True for "valid" data, and False for invalid / missing / ... I realize how the implementation makes sense (and is appropriate given that the name is "mask"), but I just thought I'd point this out... even if it's just me ;)
-=- Olivier
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Sat, Oct 29, 2011 at 4:47 PM, Eric Firing <efiring@hawaii.edu> wrote:
On 10/29/2011 12:02 PM, Olivier Delalleau wrote:
I haven't been following the discussion closely, but wouldn't it be
instead:
a.mask[0:2] = True?
That would be consistent with numpy.ma and the opposite of Mark's implementation.
I can live with either, but I much prefer the numpy.ma version because it fits with the use of bit-flags for editing data; set bit 1 if it fails check A, set bit 2 if it fails check B, etc. So, if it evaluates as True, there is a problem, and the value is masked *out*.
Similarly, in Marks implementation, 7 bits are available for a payload to describe what kind of masking is meant. This seems more consistent with True as masked (or NA) than with False as masked.
I wouldn't rely on the 7 bits yet. Mark left them available to keep open possible future use, but didn't implement anything using them yet. If memory use turns out to exclude whole sectors of application we will have to go to bit masks. Chuck
On 10/29/2011 12:57 PM, Charles R Harris wrote:
On Sat, Oct 29, 2011 at 4:47 PM, Eric Firing <efiring@hawaii.edu <mailto:efiring@hawaii.edu>> wrote:
On 10/29/2011 12:02 PM, Olivier Delalleau wrote:
> > I haven't been following the discussion closely, but wouldn't it be instead: > a.mask[0:2] = True?
That would be consistent with numpy.ma <http://numpy.ma> and the opposite of Mark's implementation.
I can live with either, but I much prefer the numpy.ma <http://numpy.ma> version because it fits with the use of bit-flags for editing data; set bit 1 if it fails check A, set bit 2 if it fails check B, etc. So, if it evaluates as True, there is a problem, and the value is masked *out*.
Similarly, in Marks implementation, 7 bits are available for a payload to describe what kind of masking is meant. This seems more consistent with True as masked (or NA) than with False as masked.
I wouldn't rely on the 7 bits yet. Mark left them available to keep open possible future use, but didn't implement anything using them yet. If memory use turns out to exclude whole sectors of application we will have to go to bit masks.
Right; I was only commenting on a subjective sense of internal consistency. A minor point. The larger context of all this is how users end up being able to work with all the different types and specifications of "NA" (in the most general sense) data: 1) nans 2) numpy.ma 3) masks in the core (Mark's new code) 4) bit patterns Substantial code now in place--including matplotlib--relies on numpy.ma. It has some rough edges, it can be slow, it is a pain having it as a bolted-on module, it may be more complicated than it needs to be, but it fits a lot of use cases pretty well. There are many users. Everyone using matplotlib is using it, whether they know it or not. The ideal from my numpy.ma-user's standpoint would an NA-handling implementation in the core that would do two things: (1) allow a gradual transition away from numpy.ma, so that the latter would become redundant. (2) allow numpy.ma to be reasonably easily modified to use the in-core facilities for greater efficiency during the long transition. Implicit is the hope that someone (most likely not me, although I might be able to help a bit) would actually perform this modification. Mark's mission, paid for by Enthought, was not to please numpy.ma users, but to add NA-handling that would be comfortable for R-users. He chose to do so with the idea that two possible implementations (masks and bitpatterns) were desirable, each with strengths and weaknesses, and that so as to get *something* done in the very short time he had left, he would start with the mask implementation. We now have the result, incomplete, but not breaking anything. Additional development (coding as well as designing) will be needed. The main question raised by Matthew and Nathaniel is, I think, whether Mark's code should develop in a direction away from the R-compatibility model, with the idea that the latter would be handled via a bit-pattern implementation, some day, when someone codes it; or whether it should remain as the prototype and first implementation of an API to handle the R-compatible use case, minimizing any divergence from any eventual bit-pattern implementation. The answer to this depends on several questions, including: 1) Who is available to do how much implementation of any of the possibilities? My reading of Travis's blog and rare posts to this list suggest that he hopes and expects to be able to free up coding time. Perhaps he will clarify that soon. 2) What sorts of changes would actually be needed to make the present implementation good enough for the R use case? Evolutionary, or revolutionary? 3) What sorts of changes would help with the numpy.ma use case? Evolutionary, or revolutionary. 4) Given available resources, how can we maximize progress: making numpy more capable, easier to use, etc. Unless the answers to questions 2 *and* 3 are "revolutionary", I don't see the point in pulling Mark's changes out of master. At most, the documentation might be changed to mark the NA API as "experimental" for a release or two. Overall, I think that the differences between the R use case and the ma use case have been overstated and over-emphasized. Eric
Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Oct 29, 2011, at 7:24 PM, Eric Firing wrote:
On 10/29/2011 12:57 PM, Charles R Harris wrote:
On Sat, Oct 29, 2011 at 4:47 PM, Eric Firing <efiring@hawaii.edu <mailto:efiring@hawaii.edu>> wrote:
On 10/29/2011 12:02 PM, Olivier Delalleau wrote:
I haven't been following the discussion closely, but wouldn't it
be instead:
a.mask[0:2] = True?
That would be consistent with numpy.ma <http://numpy.ma> and the opposite of Mark's implementation.
I can live with either, but I much prefer the numpy.ma <http://numpy.ma> version because it fits with the use of bit-flags for editing data; set bit 1 if it fails check A, set bit 2 if it fails check B, etc. So, if it evaluates as True, there is a problem, and the value is masked *out*.
Similarly, in Marks implementation, 7 bits are available for a payload to describe what kind of masking is meant. This seems more consistent with True as masked (or NA) than with False as masked.
I wouldn't rely on the 7 bits yet. Mark left them available to keep open possible future use, but didn't implement anything using them yet. If memory use turns out to exclude whole sectors of application we will have to go to bit masks.
Right; I was only commenting on a subjective sense of internal consistency. A minor point.
The larger context of all this is how users end up being able to work with all the different types and specifications of "NA" (in the most general sense) data:
1) nans 2) numpy.ma 3) masks in the core (Mark's new code) 4) bit patterns
Substantial code now in place--including matplotlib--relies on numpy.ma. It has some rough edges, it can be slow, it is a pain having it as a bolted-on module, it may be more complicated than it needs to be, but it fits a lot of use cases pretty well. There are many users. Everyone using matplotlib is using it, whether they know it or not.
The ideal from my numpy.ma-user's standpoint would an NA-handling implementation in the core that would do two things: (1) allow a gradual transition away from numpy.ma, so that the latter would become redundant. (2) allow numpy.ma to be reasonably easily modified to use the in-core facilities for greater efficiency during the long transition. Implicit is the hope that someone (most likely not me, although I might be able to help a bit) would actually perform this modification.
Mark's mission, paid for by Enthought, was not to please numpy.ma users, but to add NA-handling that would be comfortable for R-users. He chose to do so with the idea that two possible implementations (masks and bitpatterns) were desirable, each with strengths and weaknesses, and that so as to get *something* done in the very short time he had left, he would start with the mask implementation. We now have the result, incomplete, but not breaking anything. Additional development (coding as well as designing) will be needed.
The main question raised by Matthew and Nathaniel is, I think, whether Mark's code should develop in a direction away from the R-compatibility model, with the idea that the latter would be handled via a bit-pattern implementation, some day, when someone codes it; or whether it should remain as the prototype and first implementation of an API to handle the R-compatible use case, minimizing any divergence from any eventual bit-pattern implementation.
The answer to this depends on several questions, including:
1) Who is available to do how much implementation of any of the possibilities? My reading of Travis's blog and rare posts to this list suggest that he hopes and expects to be able to free up coding time. Perhaps he will clarify that soon.
2) What sorts of changes would actually be needed to make the present implementation good enough for the R use case? Evolutionary, or revolutionary?
3) What sorts of changes would help with the numpy.ma use case? Evolutionary, or revolutionary.
4) Given available resources, how can we maximize progress: making numpy more capable, easier to use, etc.
Unless the answers to questions 2 *and* 3 are "revolutionary", I don't see the point in pulling Mark's changes out of master. At most, the documentation might be changed to mark the NA API as "experimental" for a release or two.
I appreciate Nathaniel's idea to pull the changes and I can respect his desire to do that. It seemed like there was a lot more heat than light in the discussion this summer. The differences seemed to be enflamed by the discussion instead of illuminated by it. Perhaps, that is why Nathaniel felt like merging Mark's pull request was too strong-armed and not a proper resolution. However, I did not interpret Matthew or Nathaniel's explanations of their position as manipulative or inappropriate. Nonetheless, I don't think removing Mark's changes are a productive direction to take at this point. I agree, it would have been much better to reach a rough consensus before the code was committed. At least, those who felt like their ideas where not accounted for should have felt like there was some plan to either accommodate them, or some explanation of why that was not a good idea. The only thing I recall being said was that there was nobody to implement their ideas. I wish that weren't the case. I think we can still continue to discuss their concerns and look for ways to reasonably incorporate their use-cases if possible. I have probably contributed in the past to the idea that "he who writes the code gets the final say". In early-stage efforts, this is approximately right, but success of anything relies on satisfied users and as projects mature the voice of users becomes more relevant than the voice of contributors in my mind. I've certainly had to learn that in terms of ABI changes to NumPy. Personally, I am very, very interested in users of NumPy and their ideas about how things should be done. I have my own use cases from my experience, but I've always found that the code is better if it incorporates use-cases of others. In the end, I'm much more interested in users of NumPy and their use-cases and experience then even contributors. Historically, contributors to NumPy have been scarce and development slow. I am working to change that right now. I will say more when I have more to say in that direction. To be clear, in this particular case I know that there are multiple users, and the best I can tell there is some disagreement between those users about the appropriate APIs. But, this disagreement is actually lost in some of the discussion. In fact, it seems to me that the different perspectives are not all that different and their ought to be a way to work it out. Perhaps this is hopeless naivete, but it's my current perspective. I really appreciate the efforts of people who have been active on NumPy development and maintenance for the past 4 years. I also appreciate the activity of all the users of NumPy: matplotlib, Pandas, scikits, SciPy, statsmodels, and so on. The larger NumPy community is much broader than the discussions that take place on this list (or even on the SciPy list). I have seen NumPy in use in a lot of places over the past 4 years. I have also seen NumPy *not* in use where it really could be (with some adaptations). I'm still hopeful that we will continue to make this forum a place where even "just users" of NumPy always feel able to raise their voice and say, "Hey, I wish things were done this way." It is rare when all voices can be satisfied, of course, but a priori it is worth a college try. If anything I hope for emerges, the user-base of NumPy will be growing significantly over the coming months and years and I really hope this list continues to be a place where I can be comfortable sending them. More to come, -Travis
Hi, On Sat, Oct 29, 2011 at 11:19 PM, Travis Oliphant <oliphant@enthought.com> wrote: Thanks again for your email, I'm sure I'm not the only one who breathes a deep sigh of relief when I see your posts.
I appreciate Nathaniel's idea to pull the changes and I can respect his desire to do that. It seemed like there was a lot more heat than light in the discussion this summer. The differences seemed to be enflamed by the discussion instead of illuminated by it. Perhaps, that is why Nathaniel felt like merging Mark's pull request was too strong-armed and not a proper resolution.
However, I did not interpret Matthew or Nathaniel's explanations of their position as manipulative or inappropriate. Nonetheless, I don't think removing Mark's changes are a productive direction to take at this point. I agree, it would have been much better to reach a rough consensus before the code was committed. At least, those who felt like their ideas where not accounted for should have felt like there was some plan to either accommodate them, or some explanation of why that was not a good idea. The only thing I recall being said was that there was nobody to implement their ideas. I wish that weren't the case. I think we can still continue to discuss their concerns and look for ways to reasonably incorporate their use-cases if possible.
I have probably contributed in the past to the idea that "he who writes the code gets the final say". In early-stage efforts, this is approximately right, but success of anything relies on satisfied users and as projects mature the voice of users becomes more relevant than the voice of contributors in my mind. I've certainly had to learn that in terms of ABI changes to NumPy.
I think that's right though - that the person who wrote the code has the final say. But that's the final say. The question I wanted to ask was the one Nathaniel brought up at the beginning of the thread, which is, before the final say, how hard do we try for consensus? Is that - the numpy way? Here Chuck was saying 'I listen to you in proportion to your code contribution' (I hope I'm not misrepresenting him). I think that's different way of working than the consensus building that Karl Fogel describes. But maybe that is just the numpy way. I would feel happier to know what that way is. Then, when we get into this kind of dispute Chuck can say 'Matthew, change the numpy constitution or accept the situation because that's how we've agreed to work'. And I'll say - 'OK - I don't like it, but I agree those are the rules'. And we'll get on with it. But at the moment, it feels as if it isn't clear, and, as Ben pointed out, that means we are having a discussion and a discussion about the discussion at the same time. See you, Matthew
On Sun, Oct 30, 2011 at 12:47 AM, Eric Firing <efiring@hawaii.edu> wrote:
On 10/29/2011 12:02 PM, Olivier Delalleau wrote:
I haven't been following the discussion closely, but wouldn't it be instead: a.mask[0:2] = True?
That would be consistent with numpy.ma and the opposite of Mark's implementation.
I can live with either, but I much prefer the numpy.ma version because it fits with the use of bit-flags for editing data; set bit 1 if it fails check A, set bit 2 if it fails check B, etc. So, if it evaluates as True, there is a problem, and the value is masked *out*.
I think in Mark's implementation it works the same:
a = np.arange(3, maskna=True) a[1] = np.NA a array([0, NA, 2]) np.isna(a) array([False, True, False], dtype=bool)
This is more consistent than using False to represent an NA mask, I agree.
On 10/29/11 5:02 PM, Olivier Delalleau wrote:
I haven't been following the discussion closely, but wouldn't it be instead: a.mask[0:2] = True?
It's something that I actually find a bit difficult to get right in the current numpy.ma <http://numpy.ma> implementation: I would find more intuitive to have True for "valid" data, and False for invalid / missing / ... I realize how the implementation makes sense (and is appropriate given that the name is "mask"), but I just thought I'd point this out... even if it's just me ;)
Just a thought: what if this also worked: a.mask[0:2]=np.NA as a synonym for a.mask[0:2]=True? Would that be less confusing, and/or would it be less powerful or extensible in important ways? Thanks, Jason Grout
On Saturday, October 29, 2011, Jason Grout <jason-sage@creativetrax.com> wrote:
On 10/29/11 5:02 PM, Olivier Delalleau wrote:
I haven't been following the discussion closely, but wouldn't it be instead: a.mask[0:2] = True?
It's something that I actually find a bit difficult to get right in the current numpy.ma <http://numpy.ma> implementation: I would find more intuitive to have True for "valid" data, and False for invalid / missing / ... I realize how the implementation makes sense (and is appropriate given that the name is "mask"), but I just thought I'd point this out... even if it's just me ;)
Just a thought: what if this also worked:
a.mask[0:2]=np.NA
as a synonym for a.mask[0:2]=True?
Would that be less confusing, and/or would it be less powerful or extensible in important ways?
Thanks,
Jason Grout
Don't know. It is a different way of looking at it. I am also still wary of adding attributes to the array. Ben Root
On 10/29/11 2:48 PM, Ralf Gommers wrote:
That's true, but I am hoping that the difference between - say:
a[0:2] = np.NA
and
a.mask[0:2] = False
would be easy enough to imagine.
It is in this case. I agree the explicit ``a.mask`` is clearer.
Interesting -- I suspect I'm mirror's Pandas' users here: a[0:2] = np.NA is simpler and easier to me -- I'm avoiding the word "clearer" because I m not sure what it means -- if we thin it's important for the user to understand that the NA value is implemented with a mask, then setting the mask explicitly is certainly clearer -- but I don't think that's important. Indeed, I still like the idea that for "casual" use, NA could be a special value, and could be a mask, and that the user does not need to know the difference. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
Matthew, the problem I have is that it seems that you and Nathaniel won't be satisfied unless things are done *your* way.
Hi Charles, I'm sorry if I've given this impression, and I know it's easy to feel this way in a contentious discussion. I've even been tempted to conclude the same about you, based on some of those emails in the last discussion where you told us that we should only give feedback by critiquing specific paragraphs of Mark's docs, even though our issues were with the whole architecture he was suggesting. But I'd like to believe that that isn't true of you, and in return, I can only point out the following things: 1) I've actually made a number of different suggestions and attempts to find ways to compromise (e.g., the "NA concepts" discussion, the alter-NEP that folded in a design for "ignored" values to try and satisfy that constituency even though I wouldn't use them myself, and on the conference call trying to find a subset of features that we could all agree on to implement first). I don't *want* my proposals implemented unless everyone else finds them persuasive. 2) This is why in my message I'm *not* advocating that we implement NAs according to my proposals; I'm advocating that you get just as much of a veto power on my proposals as I do on yours. Let's be honest: we both know all else being equal, we'd both rather not deal with the other right now, and might prefer not to hang out socially. But if we want this NA stuff to actually work and be used, then we need to find a way to work together despite that. Peace? -- Nathaniel
Hi,
On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant <
wrote:
I think Nathaniel and Matthew provided very specific feedback that was helpful in understanding other
On Friday, October 28, 2011, Matthew Brett <matthew.brett@gmail.com> wrote: oliphant@enthought.com> perspectives
of a difficult problem. In particular, I really wanted bit-patterns implemented. However, I also understand that Mark did quite a bit of work and altered his original designs quite a bit in response to community feedback. I wasn't a major part of the pull request discussion, nor did I merge the changes, but I support Charles if he reviewed the code and felt like it was the right thing to do. I likely would have done the same thing rather than let Mark Wiebe's work languish.
My connectivity is spotty this week, so I'll stay out of the technical discussion for now, but I want to share a story.
Maybe a year ago now, Jonathan Taylor and I were debating what the best API for describing statistical models would be -- whether we wanted something like R's "formulas" (which I supported), or another approach based on sympy (his idea). To summarize, I thought his API was confusing, pointlessly complicated, and didn't actually solve the problem; he thought R-style formulas were superficially simpler but hopelessly confused and inconsistent underneath. Now, obviously, I was right and he was wrong. Well, obvious to me, anyway... ;-) But it wasn't like I could just wave a wand and make his arguments go away, no matter how annoying and wrong-headed I thought they were... I could write all the code I wanted but no-one would use it unless I could convince them it's actually the right solution, so I had to engage with him, and dig deep into his arguments.
What I discovered was that (as I thought) R-style formulas *do* have a solid theoretical basis -- but (as he thought) all the existing implementations *are* broken and inconsistent! I'm still not sure I can actually convince Jonathan to go my way, but, because of his stubbornness, I had to invent a better way of handling these formulas, and so my library[1] is actually the first implementation of these things that has a rigorous theory behind it, and in the process it avoids two fundamental, decades-old bugs in R. (And I'm not sure the R folks can fix either of them at this point without breaking a ton of code, since they both have API consequences.)
--
It's extremely common for healthy FOSS projects to insist on consensus for almost all decisions, where consensus means something like "every interested party has a veto"[2]. This seems counterintuitive, because if everyone's vetoing all the time, how does anything get done? The trick is that if anyone *can* veto, then vetoes turn out to actually be very rare. Everyone knows that they can't just ignore alternative points of view -- they have to engage with them if they want to get anything done. So you get buy-in on features early, and no vetoes are necessary. And by forcing people to engage with each other, like me with Jonathan, you get better designs.
But what about the cost of all that code that doesn't get merged, or written, because everyone's spending all this time debating instead? Better designSorry - this was too short and a little rude. I'm sorry.
I was reacting to what I perceived as intolerance for discussing the issues, and I may be wrong in that perception.
I think what Nathaniel is saying, is that it is not in the best interests of numpy to push through code where there is not good agreement. In reverting the change, he is, I think, appealing for a commitment to that process, for the good of numpy.
I have in the past taken some of your remarks to imply that if someone is prepared to write code then that overrides most potential disagreement.
The reason I think Nathaniel is the more right, is because most of us, I believe, do honestly have the interests of numpy at heart, and, want to fully understand the problem, and are prepared to be proven wrong. In that situation, in my experience of writing code at least, by far the most fruitful way to proceed is by letting all voices be heard. On the other hand, if the rule becomes 'unless I see an implementation I'm not listening to you' - then we lose the great benefits, to the code, of having what is fundamentally a good and strong community.
Best,
Matthew
Maybe an alternative implementation isn't really needed. It seemed to me that most of the current implantation isn't too far off the mark. There are just key portions missing or might need to be modified. The space issues was never ignored and Mark left room for that to be addressed. Parameterized dtypes can still be added (and isn't all that different from multi-na). Perhaps I could be convinced of a having np.MA assignments mean "ignore" and np.NA mean "absent". How far off are we really from consensus? Although, I still think that ignore + absent = ignore Cheers! Ben Root
On Fri, Oct 28, 2011 at 3:21 PM, Benjamin Root <ben.root@ou.edu> wrote:
The space issues was never ignored and Mark left room for that to be addressed. Parameterized dtypes can still be added (and isn't all that different from multi-na). Perhaps I could be convinced of a having np.MA assignments mean "ignore" and np.NA mean "absent". How far off are we really from consensus?
Do you know whether Mark is around? I think his feedback would be useful at this point; having written the code, he'll be able to evaluate some of the technical suggestions made. Stéfan
2011/10/28 Stéfan van der Walt <stefan@sun.ac.za>
On Fri, Oct 28, 2011 at 3:21 PM, Benjamin Root <ben.root@ou.edu> wrote:
The space issues was never ignored and Mark left room for that to be addressed. Parameterized dtypes can still be added (and isn't all that different from multi-na). Perhaps I could be convinced of a having np.MA assignments mean "ignore" and np.NA mean "absent". How far off are we really from consensus?
Do you know whether Mark is around? I think his feedback would be useful at this point; having written the code, he'll be able to evaluate some of the technical suggestions made.
Yes, Mark is around, but I assume he is interested in his school work at this point. And he might not be inclined to get back into this particular discussion. I don't feel he was treated very well by some last time around. Chuck
On Fri, Oct 28, 2011 at 3:49 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
2011/10/28 Stéfan van der Walt <stefan@sun.ac.za>
On Fri, Oct 28, 2011 at 3:21 PM, Benjamin Root <ben.root@ou.edu> wrote:
The space issues was never ignored and Mark left room for that to be addressed. Parameterized dtypes can still be added (and isn't all that different from multi-na). Perhaps I could be convinced of a having np.MA assignments mean "ignore" and np.NA mean "absent". How far off are we really from consensus?
Do you know whether Mark is around? I think his feedback would be useful at this point; having written the code, he'll be able to evaluate some of the technical suggestions made.
Yes, Mark is around, but I assume he is interested in his school work at this point. And he might not be inclined to get back into this particular discussion. I don't feel he was treated very well by some last time around.
We have not always been good at separating the concept of disagreement from that of rudeness. As I've said before, one form of rudeness (and not disagreement) is ignoring people. We should all be careful to point out - respectfully, and with reasons - when we find our colleagues replies (or non-replies) to be rude, because rudeness is very bad for the spirit of open discussion. Best, Matthew
On Fri, Oct 28, 2011 at 5:09 PM, Matthew Brett <matthew.brett@gmail.com>wrote:
On Fri, Oct 28, 2011 at 3:49 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
2011/10/28 Stéfan van der Walt <stefan@sun.ac.za>
On Fri, Oct 28, 2011 at 3:21 PM, Benjamin Root <ben.root@ou.edu> wrote:
The space issues was never ignored and Mark left room for that to be addressed. Parameterized dtypes can still be added (and isn't all
that
different from multi-na). Perhaps I could be convinced of a having np.MA assignments mean "ignore" and np.NA mean "absent". How far off are we really from consensus?
Do you know whether Mark is around? I think his feedback would be useful at this point; having written the code, he'll be able to evaluate some of the technical suggestions made.
Yes, Mark is around, but I assume he is interested in his school work at this point. And he might not be inclined to get back into this particular discussion. I don't feel he was treated very well by some last time around.
We have not always been good at separating the concept of disagreement from that of rudeness.
As I've said before, one form of rudeness (and not disagreement) is ignoring people.
We should all be careful to point out - respectfully, and with reasons - when we find our colleagues replies (or non-replies) to be rude, because rudeness is very bad for the spirit of open discussion.
Trying things out in preparation for discussion is also a mark of respect. Have you worked with the current implementation? Chuck
Hi, On Fri, Oct 28, 2011 at 4:21 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Fri, Oct 28, 2011 at 5:09 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
On Fri, Oct 28, 2011 at 3:49 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
2011/10/28 Stéfan van der Walt <stefan@sun.ac.za>
On Fri, Oct 28, 2011 at 3:21 PM, Benjamin Root <ben.root@ou.edu> wrote:
The space issues was never ignored and Mark left room for that to be addressed. Parameterized dtypes can still be added (and isn't all that different from multi-na). Perhaps I could be convinced of a having np.MA assignments mean "ignore" and np.NA mean "absent". How far off are we really from consensus?
Do you know whether Mark is around? I think his feedback would be useful at this point; having written the code, he'll be able to evaluate some of the technical suggestions made.
Yes, Mark is around, but I assume he is interested in his school work at this point. And he might not be inclined to get back into this particular discussion. I don't feel he was treated very well by some last time around.
We have not always been good at separating the concept of disagreement from that of rudeness.
As I've said before, one form of rudeness (and not disagreement) is ignoring people.
We should all be careful to point out - respectfully, and with reasons - when we find our colleagues replies (or non-replies) to be rude, because rudeness is very bad for the spirit of open discussion.
Trying things out in preparation for discussion is also a mark of respect. Have you worked with the current implementation?
OK - this seems to me to be rude. Why? Because you have presumably already read what my concerns were, and my discussion of the current implementation in my reply to Travis. You haven't made any effort to point out to me where I may be wrong or failing to understand. I infer that you are merely saying 'go away and come back later'. And that is rude. Best, Matthew
participants (13)
-
Benjamin Root -
Charles R Harris -
Chris Barker -
Eric Firing -
Han Genuit -
Jason Grout -
Matthew Brett -
Nathaniel Smith -
Olivier Delalleau -
Ralf Gommers -
Stéfan van der Walt -
Travis Oliphant -
Wes McKinney