[Numpy-discussion] Splitting MaskedArray into a separate package

Stefan van der Walt stefanv at berkeley.edu
Wed May 23 19:38:55 EDT 2018


Hi Eric,

On Wed, 23 May 2018 10:02:22 -1000, Eric Firing wrote:
> Masked arrays are critical to my numpy usage, and I suspect they are
> critical for many other use cases as well.

That's good to know; and the goal of this NEP should be to improve your
siatuion, not make it worse.

> In fact, I would prefer that a high priority for major numpy
> development be the more complete integration of masked array capabilities
> into numpy, not their removal to a separate package.
>
> I was unhappy to see
> the effort in that direction a few years ago being killed.  I didn't agree
> with every design decision, but overall I thought it was going in the right
> direction.

I see this and the NEP as orthogonal issues.  MaskedArrays, one
particular version of the masked value solution, has never truly been a
first class citizen.

If we could instead implement masked arrays such that it simply sits on
top of existing NumPy functionality (using, e.g., special dtypes or
bitmasks), re-using all the standard machinery, that would be a natural
fit in the core of NumPy, and would negate the need for MaskedArrays.
But we haven't reached that point yet, and I am not aware of any current
proposal to do so.

> Bad or missing values (and situations where one wants to use a mask to
> operate on a subset of an array) are found in many domains of real life; do
> you really want python users in those domains to have to fall back on
> Matlab-style reliance on nans and/or manual mask manipulations, as the new
> maskedarray package is sidelined?

This is not too far from the current status quo, I would argue.  The
functionality exists, but it is "bolted on" rather than "built in".  And
my guess is that the component will benefit from some extra attention
that it is not getting as part of the current package.

> Or is there any realistic prospect for maintenance and improvement of the
> package after it is separated out?

In order to prevent the package from being "sidelined", we would have to
strengthen this part of the story.

> Side question: does your proposed purification of numpy include elimination
> of linalg and random?  Based on the criteria in the NEP, I would expect it
> does; so maybe you should have a more ambitious NEP, and do the purification
> all in one step as a numpy version 2.0.  (Surely if masked arrays are
> purged, the matrix class should be booted out at the same time.)

That's an interesting question, and one I have wondered about.  Would it
make sense to ship just the core ndarray object?  I don't know.  It
probably depends a lot on whether we can define clear API boundaries,
whether this kind of split is desired from the average user's
perspective, and whether it could benefit the development of the
subcomponents.

W.r.t. matrices, I think you're setting a trap for me here, but I'm
going to step into it anyway ;)

https://mail.python.org/pipermail/numpy-discussion/2013-July/067254.html

It is, then, not the first time I argued in favor of moving certain
components out of NumPy onto their own packages.  I would probably have
written that NEP this time around, had it not been for the many strings
attached via SciPy sparse (and therefore sklearn etc.).  Before matrix
deprecation can be discussed further, therefore, we need to implement
sparse *arrays* for SciPy (and some efforts are slowly underway).

See also:

https://mail.python.org/pipermail/numpy-discussion/2017-January/076290.html
http://numpy-discussion.10968.n7.nabble.com/Deprecate-matrices-in-1-15-and-remove-in-1-17-tp44968.html

Best regards,
Stéfan


More information about the NumPy-Discussion mailing list