[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Fri Jun 24 21:10:34 EDT 2011

On Fri, Jun 24, 2011 at 7:02 PM, Matthew Brett <matthew.brett at gmail.com>wrote:

> Hi,
>
> On Sat, Jun 25, 2011 at 12:22 AM, Wes McKinney <wesmckinn at gmail.com>
> wrote:
> ...
> > Perhaps we should make a wiki page someplace summarizing pros and cons
> > of the various implementation approaches?
>
> But - we should do this if it really is an open question which one we
> go for.   If not then, we're just slowing Mark down in getting to the
> implementation.
>
> Assuming the question is still open, here's a starter for the pros and
> cons:
>
> array.mask
> 1) It's easier / neater to implement
>

Yes

> 2) It can generalize across dtypes
>

Yes

> 3) You can still get the masked data underneath the mask (allowing you
> to unmask etc)
>

By setting up views appropriately, yes. If you don't have another view to
the underlying data, you can't get at it.

nafloat64:
> 1) No memory overhead
>

Yes

> 2) Battle-tested implementation already done in R
>

We can't really use that though,  R is GPL and NumPy is BSD. The low-level
implementation details are likely different enough that a re-implementation
would be needed anyway.

I guess we'd have to test directly whether the non-continuous memory
> of the mask and data would cause enough cache-miss problems to
> outweigh the potential cycle-savings from single byte comparisons in
> array.mask.
>

The different memory buffers are each contiguous, so the access patterns
still have a lot of coherency. I intend to give the mask memory layouts
matching those of the arrays.

I guess that one and only one of these will get written.  I guess that
> one of these choices may be a lot more satisfying to the current and
> future masked array itch than the other.
>

I'm only going to implement one solution, yes.

I'm personally worried that the memory overhead of array.masks will
> make many of us tend to avoid them.  I work with images that can
> easily get large enough that I would not want an array-items size byte
> array added to my storage.
>

May I ask what kind of dtypes and sizes you're working with?

The reason I'm asking for more details about the implementation is
> because that is most of the argument for array.mask at the moment (1
> and 2 above).
>

I'm first trying to nail down more of the higher level requirements before
digging really deep into the implementation details. They greatly affect how
those details have to turn out.

-Mark

>
> See you,
>
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110624/de195760/attachment.html>