[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Lluís xscript at gmx.net
Wed Jun 29 14:51:36 EDT 2011


Mark Wiebe writes:
[...]   
>     I think that deciding on the value of NA signal values boils down to
>     this question: should 3rd party code be able to interpret missing data
>     information stored in the separate mask array?

> I'm tossing around some variations of ideas using the iterator to
> provide a buffered mask-based interface that works uniformly with both
> masked arrays and NA dtypes. This way 3rd party C code only needs to
> implement one missing data mechanism to fully support both of NumPy's
> missing data mechanisms.

Nice. If non-numpy C code is bound to see it as an array (i.e., _always_
oblivious to the mask concept), then you should probably do what I said
about "(un)merging" the bit pattern and mask-based NAs, but in this case
can be done on each block given by the iteration window.

There's still the possibility of giving a finer granularity interface
where both are explicitly accessed, but this will probably add yet
another set of API functions (although the merging interface can be
implemented on top of this explicit raw iteration interface).

BTW, this has some overlapping with a mail Travis sent long ago about
dynamically filling the backing byffer contents (in this case with the
"merged" NA data for 3rd parties).

It might prove completely unsatisfactory (w.r.t. performance), but you
could also fake a bit-pattern-only sequential array by using mprotect to
detect the memory accesses and trigger then the production of the merged
data. This provides means for code using the simple buffer protocol,
without duplicating the whole structure for NA merges.

This can be complicated even more with some simple strided pattern
detection to diminish the number of segfaults, as the shape is known.


Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth



More information about the NumPy-Discussion mailing list