[Numpy-discussion] missing data discussion round 2

Mark Wiebe mwwiebe at gmail.com
Thu Jun 30 11:15:45 EDT 2011


On Wed, Jun 29, 2011 at 5:42 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Wed, Jun 29, 2011 at 2:40 PM, Lluís <xscript at gmx.net> wrote:
> > I'm for the option of having a single API when you want to have NA
> > elements, regardless of whether it's using masks or bit patterns.
>
> I understand the desire to avoid having two different APIS...
>
> [snip]
> > My concern is now about how to set the "skipna" in a "comfortable" way,
> > so that I don't have to set it again and again as ufunc arguments:
> >
> >>>> a
> > array([NA, 2, 3])
> >>>> b
> > array([1, 2, NA])
> >>>> a + b
> > array([NA, 2, NA])
> >>>> a.flags.skipna=True
> >>>> b.flags.skipna=True
> >>>> a + b
> > array([1, 4, 3])
>
> ...But... now you're introducing two different kinds of arrays with
> different APIs again? Ones where .skipna==True, and ones where
> .skipna==False?
>
> I know that this way it's not keyed on the underlying storage format,
> but if we support both bit patterns and mask arrays at the
> implementation level, then the only way to make them have identical
> APIs is if we completely disallow unmasking, and shared masks, and so
> forth.


The right set of these conditions has been in the NEP from the beginning.
Unmasking without value assignment is disallowed - the only way to "see
behind the mask" or to share masks is with views. My impression is than more
people are concerned with sharing the same data between different masks,
something also supported through views.

-Mark


> Which doesn't seem like it'd be very popular (and would make
> including the mask-based implementation pretty pointless). So I think
> we have to assume that they will have APIs that are at least somewhat
> different. And then it seems like with this proposal then we'd
> actually end up with *4* different APIs that any particular array
> might follow... (or maybe more, depending on how arrays that had both
> a bit-pattern and mask ended up working).
>
> That's why I was thinking the best solution might be to just bite the
> bullet and make the APIs *totally* different and non-overlapping, so
> it was always obvious which you were using and how they'd interact.
> But I don't know -- for my work I'd be happy to just pass skipna
> everywhere I needed it, and never unmask anything, and so forth, so
> maybe there's some reason why it's really important for the
> bit-pattern NA API to overlap more with the masked array API?
>
> -- Nathaniel
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110630/bc8e69bf/attachment.html>


More information about the NumPy-Discussion mailing list