[Numpy-discussion] missing data discussion round 2

Nathaniel Smith njs at pobox.com
Wed Jun 29 18:42:53 EDT 2011


On Wed, Jun 29, 2011 at 2:40 PM, Lluís <xscript at gmx.net> wrote:
> I'm for the option of having a single API when you want to have NA
> elements, regardless of whether it's using masks or bit patterns.

I understand the desire to avoid having two different APIS...

[snip]
> My concern is now about how to set the "skipna" in a "comfortable" way,
> so that I don't have to set it again and again as ufunc arguments:
>
>>>> a
> array([NA, 2, 3])
>>>> b
> array([1, 2, NA])
>>>> a + b
> array([NA, 2, NA])
>>>> a.flags.skipna=True
>>>> b.flags.skipna=True
>>>> a + b
> array([1, 4, 3])

...But... now you're introducing two different kinds of arrays with
different APIs again? Ones where .skipna==True, and ones where
.skipna==False?

I know that this way it's not keyed on the underlying storage format,
but if we support both bit patterns and mask arrays at the
implementation level, then the only way to make them have identical
APIs is if we completely disallow unmasking, and shared masks, and so
forth. Which doesn't seem like it'd be very popular (and would make
including the mask-based implementation pretty pointless). So I think
we have to assume that they will have APIs that are at least somewhat
different. And then it seems like with this proposal then we'd
actually end up with *4* different APIs that any particular array
might follow... (or maybe more, depending on how arrays that had both
a bit-pattern and mask ended up working).

That's why I was thinking the best solution might be to just bite the
bullet and make the APIs *totally* different and non-overlapping, so
it was always obvious which you were using and how they'd interact.
But I don't know -- for my work I'd be happy to just pass skipna
everywhere I needed it, and never unmask anything, and so forth, so
maybe there's some reason why it's really important for the
bit-pattern NA API to overlap more with the masked array API?

-- Nathaniel



More information about the NumPy-Discussion mailing list