[Numpy-discussion] Missing data wrap-up and request for comments

Mark Wiebe mwwiebe at gmail.com
Fri May 11 16:25:55 EDT 2012


On Thu, May 10, 2012 at 10:28 PM, Matthew Brett <matthew.brett at gmail.com>wrote:

> Hi,
>
> On Thu, May 10, 2012 at 2:43 AM, Nathaniel Smith <njs at pobox.com> wrote:
> > Hi Matthew,
> >
> > On Thu, May 10, 2012 at 12:01 AM, Matthew Brett <matthew.brett at gmail.com>
> wrote:
> >>> The third proposal is certainly the best one from Cython's perspective;
> >>> and I imagine for those writing C extensions against the C API too.
> >>> Having PyType_Check fail for ndmasked is a very good way of having code
> >>> fail that is not written to take masks into account.
> >>
> >> Mark, Nathaniel - can you comment how your chosen approaches would
> >> interact with extension code?
> >>
> >> I'm guessing the bitpattern dtypes would be expected to cause
> >> extension code to choke if the type is not supported?
> >
> > That's pretty much how I'm imagining it, yes. Right now if you have,
> > say, a Cython function like
> >
> > cdef f(np.ndarray[double] a):
> >    ...
> >
> > and you do f(np.zeros(10, dtype=int)), then it will error out, because
> > that function doesn't know how to handle ints, only doubles. The same
> > would apply for, say, a NA-enabled integer. In general there are
> > almost arbitrarily many dtypes that could get passed into any function
> > (including user-defined ones, etc.), so C code already has to check
> > dtypes for correctness.
> >
> > Second order issues:
> > - There is certainly C code out there that just assumes that it will
> > only be passed an array with certain dtype (and ndim, memory layout,
> > etc...). If you write such C code then it's your job to make sure that
> > you only pass it the kinds of arrays that it expects, just like now
> > :-).
> >
> > - We may want to do some sort of special-casing of handling for
> > floating point NA dtypes that use an NaN as the "magic" bitpattern,
> > since many algorithms *will* work with these unchanged, and it might
> > be frustrating to have to wait for every extension module to be
> > updated just to allow for this case explicitly before using them. OTOH
> > you can easily work around this. Like say my_qr is a legacy C function
> > that will in fact propagate NaNs correctly, so float NA dtypes would
> > Just Work -- except, it errors out at the start because it doesn't
> > recognize the dtype. How annoying. We *could* have some special hack
> > you can use to force it to work anyway (by like making the "is this
> > the dtype I expect?" routine lie.) But you can also just do:
> >
> >  def my_qr_wrapper(arr):
> >    if arr.dtype is a NA float dtype with NaN magic value:
> >      result = my_qr(arr.view(arr.dtype.base_dtype))
> >      return result.view(arr.dtype)
> >    else:
> >      return my_qr(arr)
> >
> > and hey presto, now it will correctly pass through NAs. So perhaps
> > it's not worth bothering with special hacks.
> >
> > - Of course if  your extension function does want to handle NAs
> > generically, then there will be a simple C api for checking for them,
> > setting them, etc. Numpy needs such an API internally anyway!
>
> Thanks for this.
>
> Mark - in view of the discussions about Cython and extension code -
> could you say what you see as disadvantages to the ndmasked subclass
> proposal?
>

The biggest difficulty looks to me like how to work with both of them
reasonably from the C API. The idea of ndarray and ndmasked having
different independent TypeObjects, but still working through the same API
calls feels a little disconcerting. Maybe this is a reasonable compromise,
though, it would be nice to see the idea fleshed out a bit more with some
examples of how the code would work from the C level.

Cheers,
Mark


>
> Cheers,
>
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120511/cf5ff764/attachment.html>


More information about the NumPy-Discussion mailing list