[Numpy-discussion] feedback request: proposal to add masks to the core ndarray
Christopher Barker
Chris.Barker at noaa.gov
Fri Jun 24 12:13:51 EDT 2011
Nathaniel Smith wrote:
>> The 'dtype factory' idea builds on the way I've structured datetime as a
>> parameterized type,
...
Another disadvantage is that we get further from Gael Varoquaux's point:
>> Right now, the numpy array can be seen as an extension of the C
>> array, basically a pointer, a data type, and a shape (and strides).
>> This enables easy sharing with libraries that have not been
>> written with numpy in mind.
and also PEP 3118 support
It is very useful that a numpy array has a pointer to a regular old C
array -- if we introduce this special dtype, that will break (well, not
really, put the the c array would be of this particular struct).
Granted, any other C code would properly have to do something with the
mask anyway, but I still think it'd be better to keep that raw data
array standard.
This applies to switching between masked and not-masked numpy arrays
also -- I don't think I'd want the performance hot of that requiring a
data copy.
Also the idea was posted here that you could use views to have the same
data set with different masks -- that would break as well.
Nathaniel Smith wrote:
> If we think that the memory overhead for floating point types is too
> high, it would be easy to add a special case where maybe(float) used a
> distinguished NaN instead of a separate boolean.
That would be pretty cool, though in the past folks have made a good
argument that even for floats, masks have significant advantages over
"just using NaN". One might be that you can mask and unmask a value for
different operations, without losing the value.
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
More information about the NumPy-Discussion
mailing list