[Numpy-discussion] Splitting MaskedArray into a separate package
Eric Firing
efiring at hawaii.edu
Wed May 23 16:02:22 EDT 2018
On 2018/05/23 9:06 AM, Matti Picus wrote:
> MaskedArray is a strange but useful creature. This NEP proposes to
> distribute it as a separate package under the NumPy brand.
>
> As I understand the process, a proposed NEP should be first discussed
> here to gauge general acceptance, then after that the details should be
> discussed on the pull request itself
> https://github.com/numpy/numpy/pull/11146.
>
> Here is the motivation section from the NEP:
>
>> MaskedArrays are a sub-class of the NumPy ``ndarray`` that adds
>> masking capabilities, i.e. the ability to ignore or hide certain array
>> values during computation.
>>
>> While historically convenient to distribute this class inside of NumPy,
>> improved packaging has made it possible to distribute it separately
>> without difficulty.
>>
>> Motivations for this move include:
>>
>> * Focus: the NumPy package should strive to only include the
>> `ndarray` object, and the essential utilities needed to manipulate
>> such arrays.
>> * Complexity: the MaskedArray implementation is non-trivial, and imposes
>> a significant maintenance burden.
>> * Compatibility: MaskedArray objects, being subclasses of `ndarrays`,
>> often cause complications when being used with other packages.
>> Fixing these issues is outside the scope of NumPy development.
>>
>> This NEP proposes a deprecation pathway through which MaskedArrays
>> would still be accessible to users, but no longer as part of the core
>> package.
>
> Any thoughts?
>
> Matti and Stefan
I understand at least some of the motivation and potential advantages,
but as it stands, I find this NEP highly alarming. Masked arrays are
critical to my numpy usage, and I suspect they are critical for many
other use cases as well. In fact, I would prefer that a high priority
for major numpy development be the more complete integration of masked
array capabilities into numpy, not their removal to a separate package.
I was unhappy to see the effort in that direction a few years ago being
killed. I didn't agree with every design decision, but overall I
thought it was going in the right direction.
Bad or missing values (and situations where one wants to use a mask to
operate on a subset of an array) are found in many domains of real life;
do you really want python users in those domains to have to fall back on
Matlab-style reliance on nans and/or manual mask manipulations, as the
new maskedarray package is sidelined?
Or is there any realistic prospect for maintenance and improvement of
the package after it is separated out? Or of mask/missing value
handling being integrated into numpy? Is the latter option on the table
in any form, or is it DOA?
Side question: does your proposed purification of numpy include
elimination of linalg and random? Based on the criteria in the NEP, I
would expect it does; so maybe you should have a more ambitious NEP, and
do the purification all in one step as a numpy version 2.0. (Surely if
masked arrays are purged, the matrix class should be booted out at the
same time.)
Eric
More information about the NumPy-Discussion
mailing list