
On Mon, Jun 24, 2019 at 3:56 PM Allan Haldane <allanhaldane@gmail.com> wrote:
I'm not at all set on that behavior and we can do something else. For now, I chose this way since it seemed to best match the "IGNORE" mask behavior.
The behavior you described further above where the output row/col would be masked corresponds better to "NA" (propagating) mask behavior, which I am leaving for later implementation.
This does seem like a clean way to *implement* things, but from a user perspective I'm not sure I would want separate classes for "IGNORE" vs "NA" masks. I tend to think of "IGNORE" vs "NA" as descriptions of particular operations rather than the data itself. There are a spectrum of ways to handle missing data, and the right way to propagating missing values is often highly context dependent. The right way to set this is in functions where operations are defined, not on classes that may be defined far away from where the computation happen. For example, pandas has a "min_count" parameter in functions for intermediate use-cases between "IGNORE" and "NA" semantics, e.g., "take an average, unless the number of data points is fewer than min_count." Are there examples of existing projects that define separate user-facing types for different styles of masks?