[Numpy-discussion] new MaskedArray class

Stephan Hoyer shoyer at gmail.com
Mon Jun 24 19:21:17 EDT 2019


On Mon, Jun 24, 2019 at 3:56 PM Allan Haldane <allanhaldane at gmail.com>
wrote:

> I'm not at all set on that behavior and we can do something else. For
> now, I chose this way since it seemed to best match the "IGNORE" mask
> behavior.
>
> The behavior you described further above where the output row/col would
> be masked corresponds better to "NA" (propagating) mask behavior, which
> I am leaving for later implementation.


This does seem like a clean way to *implement* things, but from a user
perspective I'm not sure I would want separate classes for "IGNORE" vs "NA"
masks.

I tend to think of "IGNORE" vs "NA" as descriptions of particular
operations rather than the data itself. There are a spectrum of ways to
handle missing data, and the right way to propagating missing values is
often highly context dependent. The right way to set this is in functions
where operations are defined, not on classes that may be defined far away
from where the computation happen. For example, pandas has a "min_count"
parameter in functions for intermediate use-cases between "IGNORE" and "NA"
semantics, e.g., "take an average, unless the number of data points is
fewer than min_count."

Are there examples of existing projects that define separate user-facing
types for different styles of masks?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190624/9d23900e/attachment-0001.html>


More information about the NumPy-Discussion mailing list