[Numpy-discussion] Masking through generator arrays

Nathaniel Smith njs at pobox.com
Fri May 11 19:39:26 EDT 2012

On Thu, May 10, 2012 at 7:23 PM, Chris Barker <chris.barker at noaa.gov> wrote:
> That is one of my concerns about the "bit pattern" idea -- we've then
> created a new binary type that no other standard software understands
> -- that looks like a a lot of work to me to deal with, or even worse,
> ripe for weird, non-obvious errors in code that access that good-old
> char*.

Numpy supports a number of unusual binary data types, e.g. halfs and
datetimes, that aren't well supported by other standard software. As
Travis points out, no-one forces you to use them :-).

> So I'm happier with a mask implementation -- more memory, yes, but it
> seems more robust an easy to deal with with outside code.

Let's say we have a no-frills C function that we want to call, and
it's defined to use a mask:

  void do_calcs(double * data, char * mask, int size);

To call this function from Cython, then in the mask NAs world we do
something like:

  a = np.ascontiguousarray(a)
  do_calcs(PyArray_DATA(a), PyArray_MASK(a), a.size)

OTOH in the bitpattern NA world, we do something like:

  a = np.ascontiguousarray(a)
  mask = np.isNA(a)
  do_calcs(PyArray_DATA(a), PyArray_DATA(mask), a.size)

Of course there are various extra complexities that can come in here
depending on what you want to do if there are no NAs possible, whether
do_calcs can take a NULL mask pointer, if you're writing in C instead
of Cython then you need to use the C equivalent functions, etc. But
IMHO there's no fundamental reason why bitpatterns have to be much
more complex to deal with in outside code than masks, assuming a
properly helpful API. What can't be papered over at the API level are
the questions like, do you want to be able to "un-assign" NA to reveal
what used to be there before? That needs masks, for better or worse.

But I may well be missing something... does that address your concern,
or is there more to it?

-- Nathaniel

More information about the NumPy-Discussion mailing list