[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Nathaniel Smith njs at pobox.com
Thu Jun 23 17:19:19 EDT 2011


I'd like to see a statement of what the "missing data problem" is, and
how this solves it? Because I don't think this is entirely intuitive,
or that everyone necessarily has the same idea.

> Reduction operations like 'sum', 'prod', 'min', and 'max' will operate as if the values weren't there

For context: My experience with missing data is in statistical
analysis; I find R's NA support to be pretty awesome for those
purposes. The conceptual model it's based on is that an NA value is
some number that we just happen not to know. So from this perspective,
I find it pretty confusing that adding an unknown quantity to 3 should
result in 3, rather than another unknown quantity. (Obviously it
should be possible to compute the sum of the known values, but IME
it's important for the default behavior to be to fail loudly when
things are wonky, not to silently patch them up, possibly
incorrectly!)

Also, what should 'dot' do with missing values?

-- Nathaniel

On Thu, Jun 23, 2011 at 1:53 PM, Mark Wiebe <mwwiebe at gmail.com> wrote:
> Enthought has asked me to look into the "missing data" problem and how NumPy
> could treat it better. I've considered the different ideas of adding dtype
> variants with a special signal value and masked arrays, and concluded that
> adding masks to the core ndarray appears is the best way to deal with the
> problem in general.
> I've written a NEP that proposes a particular design, viewable here:
> https://github.com/m-paradox/numpy/blob/cmaskedarray/doc/neps/c-masked-array.rst
> There are some questions at the bottom of the NEP which definitely need
> discussion to find the best design choices. Please read, and let me know of
> all the errors and gaps you find in the document.
> Thanks,
> Mark
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>



More information about the NumPy-Discussion mailing list