[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Mark Wiebe mwwiebe at gmail.com
Sat Jun 25 15:44:42 EDT 2011


On Fri, Jun 24, 2011 at 10:59 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Fri, Jun 24, 2011 at 6:57 PM, Benjamin Root <ben.root at ou.edu> wrote:
> > On Fri, Jun 24, 2011 at 8:11 PM, Nathaniel Smith <njs at pobox.com> wrote:
> >> This is a situation where I would just... use an array and a mask,
> >> rather than a masked array. Then lots of things -- changing fill
> >> values, temporarily masking/unmasking things, etc. -- come from free,
> >> just from knowing how arrays and boolean indexing work?
> >
> > With a masked array, it is "for free".  Why re-invent the wheel?  It has
> > already been done for me.
>
> But it's not for free at all. It's an additional concept that has to
> be maintained, documented, and learned (with the last cost, which is
> multiplied by the number of users, being by far the greatest). It's
> not reinventing the wheel, it's saying hey, I have wheels and axles,
> but what I really need the library to provide is a wheel+axle
> assembly!
>

It feels like you're suggesting the NA bit pattern vs mask distinction and
the programming interface users of NumPy see are closely tied together. This
isn't the case at all, and I would like more feedback on the interface side
of things irrespective of the implementation details. Please tell me what
your wheel+axle assembly looks like.

>> Do we really get much advantage by building all these complex
> >> operations in? I worry that we're trying to anticipate and write code
> >> for every situation that users find themselves in, instead of just
> >> giving them some simple, orthogonal tools.
> >>
> >
> > This is the danger, and which is why I advocate retaining the MaskedArray
> > type that would provide the high-level "intelligent" operations,
> meanwhile
> > having in the core the basic data structures for  pairing a mask with an
> > array, and to recognize a special np.NA value that would act upon the
> mask
> > rather than the underlying data.  Users would get very basic
> functionality,
> > while the MaskedArray would continue to provide the interface that we are
> > used to.
>
> The interface as described is quite different... in particular, all
> aggregate operations would change their behavior.
>

Which operations are changing, and what is the difference in behavior? I
don't recall proposing something like this. My initial proposal had a
difference with R for the aggregate operations, but I've changed the NEP
based on your feedback.

>> As a corollary, I worry that learning and keeping track of how masked
> >> arrays work is more hassle than just ignoring them and writing the
> >> necessary code by hand as needed. Certainly I can imagine that *if the
> >> mask is a property of the data* then it's useful to have tools to keep
> >> it aligned with the data through indexing and such. But some of these
> >> other things are quicker to reimplement than to look up the docs for,
> >> and the reimplementation is easier to read, at least for me...
> >
> > What you are advocating is similar to the "tried-n-true" coding practice
> of
> > Matlab users of using NaNs.  You will hear from Matlab programmers about
> how
> > it is the greatest idea since sliced bread (and I was one of them).  Then
> I
> > was introduced to Numpy, and I while I do sometimes still do the NaN
> > approach, I realized that the masked array is a "better" way.
>
> Hey, no need to go around calling people Matlab programmers, you might
> hurt someone's feelings.
>
> But seriously, my argument is that every abstraction and new concept
> has a cost, and I'm dubious that the full masked array abstraction
> carries its weight and justifies this cost, because it's highly
> redundant with existing abstractions. That has nothing to do with how
> tried-and-true anything is.
>

The abstraction is R-like missing values, and two implementation mechanisms
are NA bit patterns and masks. There is no "full masked array abstraction"
as a component end users will have to learn.

-Mark

> As for documentation, on hard/soft masks, just look at the docs for the
> > MaskedArray constructor:
> [...snipped...]
>
> Thanks!
>
> -- Nathaniel
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110625/6dc15247/attachment.html>


More information about the NumPy-Discussion mailing list