[Numpy-discussion] feedback request: proposal to add masks to the core ndarray
Mark Wiebe
mwwiebe at gmail.com
Thu Jun 23 18:43:13 EDT 2011
On Thu, Jun 23, 2011 at 5:24 PM, Pierre GM <pgmdevlist at gmail.com> wrote:
>
> On Jun 23, 2011, at 11:55 PM, Mark Wiebe wrote:
>
> > On Thu, Jun 23, 2011 at 4:46 PM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
> > On Thu, Jun 23, 2011 at 2:53 PM, Mark Wiebe <mwwiebe at gmail.com> wrote:
> > Enthought has asked me to look into the "missing data" problem and how
> NumPy could treat it better. I've considered the different ideas of adding
> dtype variants with a special signal value and masked arrays, and concluded
> that adding masks to the core ndarray appears is the best way to deal with
> the problem in general.
> >
> > I've written a NEP that proposes a particular design, viewable here:
> >
> >
> https://github.com/m-paradox/numpy/blob/cmaskedarray/doc/neps/c-masked-array.rst
>
> Mmh, after timeseries, now masked arrays... Mark, I start to see a pattern
> here ;)
I think it speaks to what's on Enthought's mind, in any case. :)
> There are some questions at the bottom of the NEP which definitely need
> discussion to find the best design choices. Please read, and let me know of
> all the errors and gaps you find in the document.
> >
> >
> > I agree that low level support for masks is the way to go.
>
> The objective was to have numpy.ma in C, yes. Been clear since Numeric,
> but nobody had time to do it. And I still don't speak C, so Python it was.
> Anyhow, yes, there should be some work to address some of numpy.mashortcomings. I may a bit conservative, but I don't really see the reason to
> follow a radically different approach. Your idea of switching the current
> convention of mask (a True meaning that the data can be accessed) will lead
> to a lot of fun indeed. And sorry, what "general consensus about masks
> elsewhere" are you referring to ?
>
I've used masks for many things in many contexts, and was quite surprised
the first time I saw the convention being used in numpy.ma. In image
processing, it's well established that a mask is white (1) for valid pixels
and black (0) for missing/transparent pixels. Applying a mask to an image is
a multiplication, something which works out very nicely. It also corresponds
nicely with array boolean indexing, where a[a != 0] uses the mask "a != 0"
to select nonzero elements.
For at least a couple of releases, arrays with masks and instances of
numpy.ma will have to coexist, and I'm hoping the new implementation will be
intuitive enough that the transition won't be too crazy.
> > There is some consternation about the conventional True/False
> > interpretation of the mask, centered around the name "mask".
>
> Don't call it "mask" at all then. "accessible" ? "access" ? Avoid "valid",
> it's too connotated.
>
Coming up with a good name will take some thinking. Of the names so far,
"validity" is my favorite.
-Mark
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110623/05af538e/attachment.html>
More information about the NumPy-Discussion
mailing list