[Numpy-discussion] Missing data wrap-up and request for comments

Matthew Brett matthew.brett at gmail.com
Thu May 10 23:28:09 EDT 2012


Hi,

On Thu, May 10, 2012 at 2:43 AM, Nathaniel Smith <njs at pobox.com> wrote:
> Hi Matthew,
>
> On Thu, May 10, 2012 at 12:01 AM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>> The third proposal is certainly the best one from Cython's perspective;
>>> and I imagine for those writing C extensions against the C API too.
>>> Having PyType_Check fail for ndmasked is a very good way of having code
>>> fail that is not written to take masks into account.
>>
>> Mark, Nathaniel - can you comment how your chosen approaches would
>> interact with extension code?
>>
>> I'm guessing the bitpattern dtypes would be expected to cause
>> extension code to choke if the type is not supported?
>
> That's pretty much how I'm imagining it, yes. Right now if you have,
> say, a Cython function like
>
> cdef f(np.ndarray[double] a):
>    ...
>
> and you do f(np.zeros(10, dtype=int)), then it will error out, because
> that function doesn't know how to handle ints, only doubles. The same
> would apply for, say, a NA-enabled integer. In general there are
> almost arbitrarily many dtypes that could get passed into any function
> (including user-defined ones, etc.), so C code already has to check
> dtypes for correctness.
>
> Second order issues:
> - There is certainly C code out there that just assumes that it will
> only be passed an array with certain dtype (and ndim, memory layout,
> etc...). If you write such C code then it's your job to make sure that
> you only pass it the kinds of arrays that it expects, just like now
> :-).
>
> - We may want to do some sort of special-casing of handling for
> floating point NA dtypes that use an NaN as the "magic" bitpattern,
> since many algorithms *will* work with these unchanged, and it might
> be frustrating to have to wait for every extension module to be
> updated just to allow for this case explicitly before using them. OTOH
> you can easily work around this. Like say my_qr is a legacy C function
> that will in fact propagate NaNs correctly, so float NA dtypes would
> Just Work -- except, it errors out at the start because it doesn't
> recognize the dtype. How annoying. We *could* have some special hack
> you can use to force it to work anyway (by like making the "is this
> the dtype I expect?" routine lie.) But you can also just do:
>
>  def my_qr_wrapper(arr):
>    if arr.dtype is a NA float dtype with NaN magic value:
>      result = my_qr(arr.view(arr.dtype.base_dtype))
>      return result.view(arr.dtype)
>    else:
>      return my_qr(arr)
>
> and hey presto, now it will correctly pass through NAs. So perhaps
> it's not worth bothering with special hacks.
>
> - Of course if  your extension function does want to handle NAs
> generically, then there will be a simple C api for checking for them,
> setting them, etc. Numpy needs such an API internally anyway!

Thanks for this.

Mark - in view of the discussions about Cython and extension code -
could you say what you see as disadvantages to the ndmasked subclass
proposal?

Cheers,

Matthew



More information about the NumPy-Discussion mailing list