[Numpy-discussion] Splitting MaskedArray into a separate package

Eric Firing efiring at hawaii.edu
Wed May 23 16:02:22 EDT 2018


On 2018/05/23 9:06 AM, Matti Picus wrote:
> MaskedArray is a strange but useful creature. This NEP proposes to 
> distribute it as a separate package under the NumPy brand.
> 
> As I understand the process, a proposed NEP should be first discussed 
> here to gauge general acceptance, then after that the details should be 
> discussed on the pull request itself 
> https://github.com/numpy/numpy/pull/11146.
> 
> Here is the motivation section from the NEP:
> 
>> MaskedArrays are a sub-class of the NumPy ``ndarray`` that adds
>> masking capabilities, i.e. the ability to ignore or hide certain array
>> values during computation.
>>
>> While historically convenient to distribute this class inside of NumPy,
>> improved packaging has made it possible to distribute it separately
>> without difficulty.
>>
>> Motivations for this move include:
>>
>>  * Focus: the NumPy package should strive to only include the
>>    `ndarray` object, and the essential utilities needed to manipulate
>>    such arrays.
>>  * Complexity: the MaskedArray implementation is non-trivial, and imposes
>>    a significant maintenance burden.
>>  * Compatibility: MaskedArray objects, being subclasses of `ndarrays`,
>>    often cause complications when being used with other packages.
>>    Fixing these issues is outside the scope of NumPy development.
>>
>> This NEP proposes a deprecation pathway through which MaskedArrays
>> would still be accessible to users, but no longer as part of the core
>> package.
> 
> Any thoughts?
> 
> Matti and Stefan

I understand at least some of the motivation and potential advantages, 
but as it stands, I find this NEP highly alarming.  Masked arrays are 
critical to my numpy usage, and I suspect they are critical for many 
other use cases as well.  In fact, I would prefer that a high priority 
for major numpy development be the more complete integration of masked 
array capabilities into numpy, not their removal to a separate package. 
I was unhappy to see the effort in that direction a few years ago being 
killed.  I didn't agree with every design decision, but overall I 
thought it was going in the right direction.

Bad or missing values (and situations where one wants to use a mask to 
operate on a subset of an array) are found in many domains of real life; 
do you really want python users in those domains to have to fall back on 
Matlab-style reliance on nans and/or manual mask manipulations, as the 
new maskedarray package is sidelined?

Or is there any realistic prospect for maintenance and improvement of 
the package after it is separated out?  Or of mask/missing value 
handling being integrated into numpy?  Is the latter option on the table 
in any form, or is it DOA?

Side question: does your proposed purification of numpy include 
elimination of linalg and random?  Based on the criteria in the NEP, I 
would expect it does; so maybe you should have a more ambitious NEP, and 
do the purification all in one step as a numpy version 2.0.  (Surely if 
masked arrays are purged, the matrix class should be booted out at the 
same time.)

Eric


More information about the NumPy-Discussion mailing list