[Numpy-discussion] Splitting MaskedArray into a separate package

Benjamin Root ben.v.root at gmail.com
Wed May 23 22:52:53 EDT 2018


users of a package does not equate to maintainers of a package. Scikits are
successful because scientists that have specialty in a field can contribute
code and support the packages using their domain knowledge. How many people
here are specialists in masked/missing value computation?

Would I like to see better missing value support in numpy? Sure, but until
then, MaskedArrays are what we have and it is still better than just using
NaNs all over the place.

Cheers!
Ben Root

On Wed, May 23, 2018 at 7:38 PM, Stefan van der Walt <stefanv at berkeley.edu>
wrote:

> Hi Eric,
>
> On Wed, 23 May 2018 10:02:22 -1000, Eric Firing wrote:
> > Masked arrays are critical to my numpy usage, and I suspect they are
> > critical for many other use cases as well.
>
> That's good to know; and the goal of this NEP should be to improve your
> siatuion, not make it worse.
>
> > In fact, I would prefer that a high priority for major numpy
> > development be the more complete integration of masked array capabilities
> > into numpy, not their removal to a separate package.
> >
> > I was unhappy to see
> > the effort in that direction a few years ago being killed.  I didn't
> agree
> > with every design decision, but overall I thought it was going in the
> right
> > direction.
>
> I see this and the NEP as orthogonal issues.  MaskedArrays, one
> particular version of the masked value solution, has never truly been a
> first class citizen.
>
> If we could instead implement masked arrays such that it simply sits on
> top of existing NumPy functionality (using, e.g., special dtypes or
> bitmasks), re-using all the standard machinery, that would be a natural
> fit in the core of NumPy, and would negate the need for MaskedArrays.
> But we haven't reached that point yet, and I am not aware of any current
> proposal to do so.
>
> > Bad or missing values (and situations where one wants to use a mask to
> > operate on a subset of an array) are found in many domains of real life;
> do
> > you really want python users in those domains to have to fall back on
> > Matlab-style reliance on nans and/or manual mask manipulations, as the
> new
> > maskedarray package is sidelined?
>
> This is not too far from the current status quo, I would argue.  The
> functionality exists, but it is "bolted on" rather than "built in".  And
> my guess is that the component will benefit from some extra attention
> that it is not getting as part of the current package.
>
> > Or is there any realistic prospect for maintenance and improvement of the
> > package after it is separated out?
>
> In order to prevent the package from being "sidelined", we would have to
> strengthen this part of the story.
>
> > Side question: does your proposed purification of numpy include
> elimination
> > of linalg and random?  Based on the criteria in the NEP, I would expect
> it
> > does; so maybe you should have a more ambitious NEP, and do the
> purification
> > all in one step as a numpy version 2.0.  (Surely if masked arrays are
> > purged, the matrix class should be booted out at the same time.)
>
> That's an interesting question, and one I have wondered about.  Would it
> make sense to ship just the core ndarray object?  I don't know.  It
> probably depends a lot on whether we can define clear API boundaries,
> whether this kind of split is desired from the average user's
> perspective, and whether it could benefit the development of the
> subcomponents.
>
> W.r.t. matrices, I think you're setting a trap for me here, but I'm
> going to step into it anyway ;)
>
> https://mail.python.org/pipermail/numpy-discussion/2013-July/067254.html
>
> It is, then, not the first time I argued in favor of moving certain
> components out of NumPy onto their own packages.  I would probably have
> written that NEP this time around, had it not been for the many strings
> attached via SciPy sparse (and therefore sklearn etc.).  Before matrix
> deprecation can be discussed further, therefore, we need to implement
> sparse *arrays* for SciPy (and some efforts are slowly underway).
>
> See also:
>
> https://mail.python.org/pipermail/numpy-discussion/
> 2017-January/076290.html
> http://numpy-discussion.10968.n7.nabble.com/Deprecate-
> matrices-in-1-15-and-remove-in-1-17-tp44968.html
>
> Best regards,
> Stéfan
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180523/7d4c2051/attachment-0001.html>


More information about the NumPy-Discussion mailing list