[Numpy-discussion] NA masks in the next numpy release?

Charles R Harris charlesr.harris at gmail.com
Fri Oct 28 19:19:23 EDT 2011


On Fri, Oct 28, 2011 at 5:05 PM, Chris.Barker <Chris.Barker at noaa.gov> wrote:

> On 10/28/11 11:37 AM, Matthew Brett wrote:
> > The main motivation for the alterNEP was our strong feeling that
> > separating ABSENT and IGNORE was easier to comprehend and cleaner.
>
> I don't know about easier to comprehend, or cleaner, but it is more
> feature-full.
>
> I see two issues here:
>
> 1) being able to distinguish between "ignore" and "not valid"
>   -- and being able to stop ignoring an ignored value.
>
> This could quite easily be accomplished with a mask approach -- indeed
> with 8 bits, you could have 8 different possible masked states (not that
> I'm suggesting that, at least not in core numpy.)
>
> However, with a bit-pattern approach, you simply can't implement
> "ignore". Once it's been set, the previous value is lost.
>
>
> 2) data size: A full mask takes extra space, sometimes a substantial
> amount -- so a bit-pattern approach would be nice.
>
>
> I like the idea (that I think Mark attempted to implement) that the
> implementation should be hidden from the user - not necessarily entirely
> hidden, but subtle enough that that casual user wouldn't need to care
> about it.
>
>
I believe the main reason it is hidden from the user is so that the
implementation can be changed without impacting existing applications.

What I would like to see at this point is folks trying out the software and
asking questions on the list like: "I want to do A and tried B, which didn't
work. Any suggestions?" In short, I want people to actually use the software
to see what issues arise so that we can fix things up.

Memory use is a known problem. One way to start addressing it might be to
implement a "bit" arraytype. It might even be possible to prototype that on
top of the existing types. Views make bit arrays a bit more interesting ;)

In that case, I think if we could decide that we want both "ignore" and
> "not valid" (and it seems there is a fair bit of interest in that), then
> we can proceed with a mask-based approach, and develop an API that makes
> as little reference to the mask as possible.
>
>
Then a bit-pattern approach could be developed that uses the same API --
> it would not have the "ignore" option at all, but would be the same for
> the "not valid" option.
>
> When I write this it seem entirely too complicated for both the
> developers and users, but maybe it's not -- it could be analogous to
> what we have now: arrays can be Fortran or C ordered, contiguous or not,
> be views on other arrays or not. To really make numpy dance, you need to
> understand all that, but you can also do a whole lot, and write a lot of
> generic code, without digging into that.
>
> If we do all that, maybe there could be a sparse mask implementation,
> etc. as well.
>
>
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20111028/7f47336c/attachment.html>


More information about the NumPy-Discussion mailing list