[Numpy-discussion] NA, and replacement or reimplimentation of np.ma

Eric Firing efiring at hawaii.edu
Fri Jun 14 14:23:10 EDT 2013

On 2013/06/14 7:22 AM, Nathaniel Smith wrote:
> On Wed, Jun 12, 2013 at 7:43 PM, Eric Firing <efiring at hawaii.edu> wrote:
>> On 2013/06/12 2:10 AM, Nathaniel Smith wrote:
>>> Personally I think that overloading np.empty is horribly ugly, will
>>> continue confusing newbies and everyone else indefinitely, and I'm
>>> 100% convinced that we'll regret implementing such a warty interface
>>> for something that should be so idiomatic. (Unfortunately I got busy
>>> and didn't actually say this in the previous thread though.) So I
>>> think we should just merge the PR as is. The only downside is the
>>> np.ma inconsistency, but, np.ma is already inconsistent (cf.
>>> masked_array.fill versus masked_array.filled!), somewhat deprecated,
>> "somewhat deprecated"?  Really?  Since when?  By whom?  Replaced by what?
> Sorry, not trying to start a fight, just trying to summarize the
> situation. As far as I can tell:
> Despite heroic efforts on the part of its authors, numpy.ma has a
> number of weird quirks (masked data can still trigger invalid value
> errors), misfeatures (hard versus soft masks), and just plain old pain
> points (ongoing issues with whether any given operation will respect
> or preserve the mask).
> It's been in deep maintenance mode for some time; we merge the
> occasional bug fix that people send in, and that's it. (To be fair,
> numpy as a whole is fairly slow-moving, but numpy.ma still gets much
> less attention.)
> Even if there were active maintainers, no-one really has any idea how
> to fix any of the problems above; they're not so much bugs as
> intrinsic limitations of the design.
> Therefore, my impression is that a majority (not all, but a majority)
> of numpy developers strongly recommend against the use of numpy.ma in
> new projects.
> I could be wrong! And I know there's nothing to really replace it. I'd
> like to fix that. But I think "semi-deprecated" is not an unfair
> shorthand for the above.
> (I'll even admit that I'd *like* to actually deprecate it. But what I
> mean by that is, I don't think it's possible to fix it to the point
> where it's actually a solid/clean/robust library, so I'd like to reach
> a point where everyone who's currently using it is happier switching
> to something else and is happy to sign off on deprecating it.)


I've been pondering when to bring this up again, but you did it for me, 
so here it is with a new title for the thread.  Maybe it will be short 
and sweet, maybe not.

I think we can agree that there is major interest in having good numpy 
support for one or more styles of missing/masked values.  You might not 
agree, but I will assert that the style of support provided by np.ma is 
*very* useful; it serves a real purpose in working code.  We do agree 
that np.ma has problems.  It is not at all clear to me, however, that 
those problems cannot or should not be fixed.  Even if they can't, I 
don't think they are so severe that it is wise to try to kill off np.ma 
*before* there is a good replacement.

In the NA branch, an attempt was made to lay the groundwork for solid 
missing/masked support.  I did not agree with every design aspect, but I 
thought it was nevertheless good as groundwork, and could be used to 
greatly improve np.ma, to provide a different style of support for those 
who require it, and perhaps to lead over the very long term to a 
withering away of the need for np.ma.

Some of the groundwork from the NA branch survived, but most of it is 
sitting off to the side.

Is there any way to revive this line of development?  To satisfy the 
needs of people coming from the R world *and* of people for whom np.ma 
is, despite its warts, an important tool?  This seems to me to be the 
single biggest area where numpy needs development.

It looks like this problem needs dedicated resources: a grant, a major 
corporate effort, or both.

Numpy is central to python in science, but it doesn't seem to have a 
corresponding level of direction and support.


More information about the NumPy-Discussion mailing list