[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Robert Kern robert.kern at gmail.com
Fri Jun 24 10:01:15 EDT 2011


On Fri, Jun 24, 2011 at 06:47, Matthew Brett <matthew.brett at gmail.com> wrote:
> Hi,
>
> On Thu, Jun 23, 2011 at 10:44 PM, Robert Kern <robert.kern at gmail.com> wrote:
>> On Thu, Jun 23, 2011 at 15:53, Mark Wiebe <mwwiebe at gmail.com> wrote:
>>> Enthought has asked me to look into the "missing data" problem and how NumPy
>>> could treat it better. I've considered the different ideas of adding dtype
>>> variants with a special signal value and masked arrays, and concluded that
>>> adding masks to the core ndarray appears is the best way to deal with the
>>> problem in general.
>>> I've written a NEP that proposes a particular design, viewable here:
>>> https://github.com/m-paradox/numpy/blob/cmaskedarray/doc/neps/c-masked-array.rst
>>> There are some questions at the bottom of the NEP which definitely need
>>> discussion to find the best design choices. Please read, and let me know of
>>> all the errors and gaps you find in the document.
>>
>> One thing that could use more explanation is how your proposal
>> improves on the status quo, i.e. numpy.ma. As far as I can see, you
>> are mostly just shuffling around the functionality that already
>> exists. There has been a continual desire for something like R's NA
>> values by people who are very familiar with both R and numpy's masked
>> arrays. Both have their uses, and as Nathaniel points out, R's
>> approach seems to be very well-liked by a lot of users. In essence,
>> *that's* the "missing data problem" that you were charged with: making
>> happy the users who are currently dissatisfied with masked arrays. It
>> doesn't seem to me that moving the functionality from numpy.ma to
>> numpy.ndarray resolves any of their issues.
>
> Maybe it would help if you could say specifically which issues you
> think are not being addressed?  Or was this more in the way of a
> 'please speak up'?

More the latter. Any proposal that purports to replace numpy.ma ought
to at least *mention* it. I think it's a fine proposal for a system de
novo, but it's not de novo.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco



More information about the NumPy-Discussion mailing list