[Numpy-discussion] Missing data wrap-up and request for comments

Nathaniel Smith njs at pobox.com
Thu May 10 05:43:07 EDT 2012


Hi Matthew,

On Thu, May 10, 2012 at 12:01 AM, Matthew Brett <matthew.brett at gmail.com> wrote:
>> The third proposal is certainly the best one from Cython's perspective;
>> and I imagine for those writing C extensions against the C API too.
>> Having PyType_Check fail for ndmasked is a very good way of having code
>> fail that is not written to take masks into account.
>
> Mark, Nathaniel - can you comment how your chosen approaches would
> interact with extension code?
>
> I'm guessing the bitpattern dtypes would be expected to cause
> extension code to choke if the type is not supported?

That's pretty much how I'm imagining it, yes. Right now if you have,
say, a Cython function like

cdef f(np.ndarray[double] a):
    ...

and you do f(np.zeros(10, dtype=int)), then it will error out, because
that function doesn't know how to handle ints, only doubles. The same
would apply for, say, a NA-enabled integer. In general there are
almost arbitrarily many dtypes that could get passed into any function
(including user-defined ones, etc.), so C code already has to check
dtypes for correctness.

Second order issues:
- There is certainly C code out there that just assumes that it will
only be passed an array with certain dtype (and ndim, memory layout,
etc...). If you write such C code then it's your job to make sure that
you only pass it the kinds of arrays that it expects, just like now
:-).

- We may want to do some sort of special-casing of handling for
floating point NA dtypes that use an NaN as the "magic" bitpattern,
since many algorithms *will* work with these unchanged, and it might
be frustrating to have to wait for every extension module to be
updated just to allow for this case explicitly before using them. OTOH
you can easily work around this. Like say my_qr is a legacy C function
that will in fact propagate NaNs correctly, so float NA dtypes would
Just Work -- except, it errors out at the start because it doesn't
recognize the dtype. How annoying. We *could* have some special hack
you can use to force it to work anyway (by like making the "is this
the dtype I expect?" routine lie.) But you can also just do:

  def my_qr_wrapper(arr):
    if arr.dtype is a NA float dtype with NaN magic value:
      result = my_qr(arr.view(arr.dtype.base_dtype))
      return result.view(arr.dtype)
    else:
      return my_qr(arr)

and hey presto, now it will correctly pass through NAs. So perhaps
it's not worth bothering with special hacks.

- Of course if  your extension function does want to handle NAs
generically, then there will be a simple C api for checking for them,
setting them, etc. Numpy needs such an API internally anyway!

-- Nathaniel



More information about the NumPy-Discussion mailing list