[Numpy-discussion] missing data discussion round 2

Tue Jun 28 16:09:23 EDT 2011

On Tue, Jun 28, 2011 at 12:41 PM, Eric Firing <efiring at hawaii.edu> wrote:
> I think you are exaggerating some of the differences associated with the
> implementation, and ignoring one *key* difference: for integer types,
> the masked implementation can handle the full numeric range of the type,
> while the bit-pattern approach cannot.

You can get something semantically equivalent to the masked
implementation by adding some extra bits and then stealing those.
(That was the original "maybe(...)" idea.) My proposal would make it
easy to implement either (either for us or for users, if we decide we
don't want to clutter up the numpy core with too many pre-canned NA
implementations).

Doing this would give up either memory or speed versus both the
separate-mask approach and the purely-bit-stealing approaches, but I
don't know if anyone cares about NA support in integers *that* much --
personally I want it to be possible, because count data is important
in statistics, but I don't really care how efficient it is. Floating
point is much more important in practice. (Heck, R usually uses
doubles for count data too -- you can't get an integer without an
explicit cast.)

-- Nathaniel