On Jun 14, 2013, at 20:23 , Eric Firing <efiring@hawaii.edu> wrote:
On 2013/06/14 7:22 AM, Nathaniel Smith wrote:
On Wed, Jun 12, 2013 at 7:43 PM, Eric Firing <efiring@hawaii.edu> wrote:
On 2013/06/12 2:10 AM, Nathaniel Smith wrote:
Personally I think that overloading np.empty is horribly ugly, will continue confusing newbies and everyone else indefinitely, and I'm 100% convinced that we'll regret implementing such a warty interface for something that should be so idiomatic. (Unfortunately I got busy and didn't actually say this in the previous thread though.) So I think we should just merge the PR as is. The only downside is the np.ma inconsistency, but, np.ma is already inconsistent (cf. masked_array.fill versus masked_array.filled!), somewhat deprecated,
"somewhat deprecated"? Really? Since when? By whom? Replaced by what?
Sorry, not trying to start a fight, just trying to summarize the situation. As far as I can tell:
Despite heroic efforts on the part of its authors, numpy.ma has a number of weird quirks (masked data can still trigger invalid value errors), misfeatures (hard versus soft masks), and just plain old pain points (ongoing issues with whether any given operation will respect or preserve the mask).
The "invalid value errors" are a side-effect of some design decisions taken 6-7 years ago. It turned out to be more efficient in terms of speed to follow an approach "compute without the mask, put it back afterwards" than the original "mask before, fill the holes with some value, compute, put the mask back": some functions like `pow` that were not part of the very first implementations twisted my arm on this one. It's far from perfect, it's rather disappointing, but I don't see a workaround with the current "let's do it in python" approach. Any other implementation would have to be done directly in C (or maybe in Cython, it's been 5 years since I last touched it).
It's been in deep maintenance mode for some time; we merge the occasional bug fix that people send in, and that's it. (To be fair, numpy as a whole is fairly slow-moving, but numpy.ma still gets much less attention.)
It never had a lot...
Therefore, my impression is that a majority (not all, but a majority) of numpy developers strongly recommend against the use of numpy.ma in new projects.
And you take that from? OK, to be frank, *I* would advise against a very naive use of np.ma: there are plenty of tricks to know to be really efficient with masked arrays. Most of the functions of the module are just for convenience in interactive mode…
I think we can agree that there is major interest in having good numpy support for one or more styles of missing/masked values. You might not agree, but I will assert that the style of support provided by np.ma is *very* useful; it serves a real purpose in working code. We do agree that np.ma has problems. It is not at all clear to me, however, that those problems cannot or should not be fixed. Even if they can't, I don't think they are so severe that it is wise to try to kill off np.ma *before* there is a good replacement.
Quite agreed with that
In the NA branch, an attempt was made to lay the groundwork for solid missing/masked support. I did not agree with every design aspect,
Talking about it, was a consensus (or at least a majority) reached about NA w/vs missing data ?
but I thought it was nevertheless good as groundwork, and could be used to greatly improve np.ma, to provide a different style of support for those who require it, and perhaps to lead over the very long term to a withering away of the need for np.ma.
When I started rewriting np.ma, Paul Dubois wrote me that 'if he were to do it again, it'd be in C, and that he disagreed with my approach' (I'm paraphrasing, but the gist is here). Of course, like every kid, I thought I knew better. In retrospect, he was quite right. I'm no longer convinced that MaskedArray as a subclass of ndarray is a correct approach. It works, it worked well enough for my needs at the time, it was a very educational journey, but if I were to do it again...
Is there any way to revive this line of development? To satisfy the needs of people coming from the R world *and* of people for whom np.ma is, despite its warts, an important tool? This seems to me to be the single biggest area where numpy needs development.
I'm always surprised by the antagonism some people have towards np.ma… You can't always use NaN to represent the missing information you're doomed to meet in the real world.
It looks like this problem needs dedicated resources: a grant, a major corporate effort, or both.
<plug class="shameless">Fund me I'm yours</plug
Numpy is central to python in science, but it doesn't seem to have a corresponding level of direction and support.
Ecco. More seriously, I'd be delighted to help. I can no longer work on it full time as I used to (even if I were not supposed to) but I can often explain why things were done the way they are and how we could improve them..