On Fri, Oct 28, 2011 at 5:05 PM, Chris.Barker
<Chris.Barker@noaa.gov> wrote:
On 10/28/11 11:37 AM, Matthew Brett wrote:
> The main motivation for the alterNEP was our strong feeling that
> separating ABSENT and IGNORE was easier to comprehend and cleaner.
I don't know about easier to comprehend, or cleaner, but it is more
feature-full.
I see two issues here:
1) being able to distinguish between "ignore" and "not valid"
-- and being able to stop ignoring an ignored value.
This could quite easily be accomplished with a mask approach -- indeed
with 8 bits, you could have 8 different possible masked states (not that
I'm suggesting that, at least not in core numpy.)
However, with a bit-pattern approach, you simply can't implement
"ignore". Once it's been set, the previous value is lost.
2) data size: A full mask takes extra space, sometimes a substantial
amount -- so a bit-pattern approach would be nice.
I like the idea (that I think Mark attempted to implement) that the
implementation should be hidden from the user - not necessarily entirely
hidden, but subtle enough that that casual user wouldn't need to care
about it.
I believe the main reason it is hidden from the user is so that the implementation can be changed without impacting existing applications.
What I would like to see at this point is folks trying out the software and asking questions on the list like: "I want to do A and tried B, which didn't work. Any suggestions?" In short, I want people to actually use the software to see what issues arise so that we can fix things up.
Memory use is a known problem. One way to start addressing it might be to implement a "bit" arraytype. It might even be possible to prototype that on top of the existing types. Views make bit arrays a bit more interesting ;)
In that case, I think if we could decide that we want both "ignore" and
"not valid" (and it seems there is a fair bit of interest in that), then
we can proceed with a mask-based approach, and develop an API that makes
as little reference to the mask as possible.
Then a bit-pattern approach could be developed that uses the same API --
it would not have the "ignore" option at all, but would be the same for
the "not valid" option.
When I write this it seem entirely too complicated for both the
developers and users, but maybe it's not -- it could be analogous to
what we have now: arrays can be Fortran or C ordered, contiguous or not,
be views on other arrays or not. To really make numpy dance, you need to
understand all that, but you can also do a whole lot, and write a lot of
generic code, without digging into that.
If we do all that, maybe there could be a sparse mask implementation,
etc. as well.
Chuck