[Numpy-discussion] NA masks in the next numpy release?

Chris.Barker Chris.Barker at noaa.gov
Fri Oct 28 19:05:43 EDT 2011


On 10/28/11 11:37 AM, Matthew Brett wrote:
> The main motivation for the alterNEP was our strong feeling that
> separating ABSENT and IGNORE was easier to comprehend and cleaner.

I don't know about easier to comprehend, or cleaner, but it is more 
feature-full.

I see two issues here:

1) being able to distinguish between "ignore" and "not valid"
   -- and being able to stop ignoring an ignored value.

This could quite easily be accomplished with a mask approach -- indeed 
with 8 bits, you could have 8 different possible masked states (not that 
I'm suggesting that, at least not in core numpy.)

However, with a bit-pattern approach, you simply can't implement 
"ignore". Once it's been set, the previous value is lost.


2) data size: A full mask takes extra space, sometimes a substantial 
amount -- so a bit-pattern approach would be nice.


I like the idea (that I think Mark attempted to implement) that the 
implementation should be hidden from the user - not necessarily entirely 
hidden, but subtle enough that that casual user wouldn't need to care 
about it.

In that case, I think if we could decide that we want both "ignore" and 
"not valid" (and it seems there is a fair bit of interest in that), then 
we can proceed with a mask-based approach, and develop an API that makes 
as little reference to the mask as possible.

Then a bit-pattern approach could be developed that uses the same API -- 
it would not have the "ignore" option at all, but would be the same for 
the "not valid" option.

When I write this it seem entirely too complicated for both the 
developers and users, but maybe it's not -- it could be analogous to 
what we have now: arrays can be Fortran or C ordered, contiguous or not, 
be views on other arrays or not. To really make numpy dance, you need to 
understand all that, but you can also do a whole lot, and write a lot of 
generic code, without digging into that.

If we do all that, maybe there could be a sparse mask implementation, 
etc. as well.

Maybe I'm dreaming, though...

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov



More information about the NumPy-Discussion mailing list