On 2013/06/14 7:22 AM, Nathaniel Smith wrote:
On Wed, Jun 12, 2013 at 7:43 PM, Eric Firing <efiring@hawaii.edu> wrote:
On 2013/06/12 2:10 AM, Nathaniel Smith wrote:
Personally I think that overloading np.empty is horribly ugly, will continue confusing newbies and everyone else indefinitely, and I'm 100% convinced that we'll regret implementing such a warty interface for something that should be so idiomatic. (Unfortunately I got busy and didn't actually say this in the previous thread though.) So I think we should just merge the PR as is. The only downside is the np.ma inconsistency, but, np.ma is already inconsistent (cf. masked_array.fill versus masked_array.filled!), somewhat deprecated,
"somewhat deprecated"? Really? Since when? By whom? Replaced by what?
Sorry, not trying to start a fight, just trying to summarize the situation. As far as I can tell:
Despite heroic efforts on the part of its authors, numpy.ma has a number of weird quirks (masked data can still trigger invalid value errors), misfeatures (hard versus soft masks), and just plain old pain points (ongoing issues with whether any given operation will respect or preserve the mask).
It's been in deep maintenance mode for some time; we merge the occasional bug fix that people send in, and that's it. (To be fair, numpy as a whole is fairly slow-moving, but numpy.ma still gets much less attention.)
Even if there were active maintainers, no-one really has any idea how to fix any of the problems above; they're not so much bugs as intrinsic limitations of the design.
Therefore, my impression is that a majority (not all, but a majority) of numpy developers strongly recommend against the use of numpy.ma in new projects.
I could be wrong! And I know there's nothing to really replace it. I'd like to fix that. But I think "semi-deprecated" is not an unfair shorthand for the above.
(I'll even admit that I'd *like* to actually deprecate it. But what I mean by that is, I don't think it's possible to fix it to the point where it's actually a solid/clean/robust library, so I'd like to reach a point where everyone who's currently using it is happier switching to something else and is happy to sign off on deprecating it.)
Nathaniel, I've been pondering when to bring this up again, but you did it for me, so here it is with a new title for the thread. Maybe it will be short and sweet, maybe not. I think we can agree that there is major interest in having good numpy support for one or more styles of missing/masked values. You might not agree, but I will assert that the style of support provided by np.ma is *very* useful; it serves a real purpose in working code. We do agree that np.ma has problems. It is not at all clear to me, however, that those problems cannot or should not be fixed. Even if they can't, I don't think they are so severe that it is wise to try to kill off np.ma *before* there is a good replacement. In the NA branch, an attempt was made to lay the groundwork for solid missing/masked support. I did not agree with every design aspect, but I thought it was nevertheless good as groundwork, and could be used to greatly improve np.ma, to provide a different style of support for those who require it, and perhaps to lead over the very long term to a withering away of the need for np.ma. Some of the groundwork from the NA branch survived, but most of it is sitting off to the side. Is there any way to revive this line of development? To satisfy the needs of people coming from the R world *and* of people for whom np.ma is, despite its warts, an important tool? This seems to me to be the single biggest area where numpy needs development. It looks like this problem needs dedicated resources: a grant, a major corporate effort, or both. Numpy is central to python in science, but it doesn't seem to have a corresponding level of direction and support. Eric