[Numpy-discussion] NA-mask interactions with existing C code
Dag Sverre Seljebotn
d.s.seljebotn at astro.uio.no
Thu May 10 18:47:27 EDT 2012
On 05/11/2012 12:28 AM, Mark Wiebe wrote:
> I did some searching for typical Cython and C code which accesses numpy
> arrays, and added a section to the NEP describing how they behave in the
> current implementation. Cython code which uses either straight Python
> access or the buffer protocol is fine (after a bugfix in numpy, it
> wasn't failing currently as it should in the pep3118 case). C code which
> follows the recommended practice of using PyArray_FromAny or one of the
> related macros is also fine, because these functions have been made to
> fail on NA-masked arrays unless the flag NPY_ARRAY_ALLOWNA is provided.
> In general, code which follows the recommended numpy practices will
> raise exceptions when encountering NA-masked arrays. This means
> programmers don't have to worry about the NA unless they want to support
> it. Having things go through PyArray_FromAny also provides a place where
> lazy evaluation arrays could be evaluated, and other similar potential
> future extensions can use to provide compatibility.
> Here's the section I added to the NEP:
> Interaction With Pre-existing C API Usage
> Making sure existing code using the C API, whether it's written in C, C++,
> or Cython, does something reasonable is an important goal of this
> The general strategy is to make existing code which does not explicitly
> tell numpy it supports NA masks fail with an exception saying so. There are
> a few different access patterns people use to get ahold of the numpy
> array data,
> here we examine a few of them to see what numpy can do. These examples are
> found from doing google searches of numpy C API array access.
> Numpy Documentation - How to extend NumPy
> This page has a section "Dealing with array objects" which has some
> advice for how
> to access numpy arrays from C. When accepting arrays, the first step it
> suggests is
> to use PyArray_FromAny or a macro built on that function, so code
> following this
> advice will properly fail when given an NA-masked array it doesn't know
> how to handle.
> The way this is handled is that PyArray_FromAny requires a special flag,
> before it will allow NA-masked arrays to flow through.
> Code which does not follow this advice, and instead just calls
> PyArray_Check() to verify
> its an ndarray and checks some flags, will silently produce incorrect
> results. This style
> of code does not provide any opportunity for numpy to say "hey, this
> array is special",
> so also is not compatible with future ideas of lazy evaluation, derived
> dtypes, etc.
This doesn't really cover the Cython code I write that interfaces with C
(and probably the code others write in Cython).
Often I'd do:
cdef np.ndarray arr = np.asarray(arg)
So I mix Python np.asarray with C PyArray_DATA. In general, I think you
use PyArray_FromAny if you're very concerned about performance or need
some special flag, but it's certainly not the first thing you tgry.
But in general, I will often be lazy and just do
def f(np.ndarray arr):
It's an exception if you don't provide an array -- so who cares. (I
guess the odds of somebody feeding a masked array to code like that,
which doesn't try to be friendly, is relatively smaller though.)
If you know the datatype, you can really do
def f(np.ndarray[double] arr):
which works with PEP 3118. But I use PyArray_DATA out of habit (and
since it works in the cases without dtype).
Frankly, I don't expect any Cython code to do the right thing here;
calling PyArray_FromAny is much more typing. And really, nobody ever
questioned that if we had an actual ndarray instance, we'd be allowed to
I don't know how much Cython code is out there in the wild for which
this is a problem. Either way, it would cause something of a reeducation
challenge for Cython users.
> Tutorial From Cython Website
> This tutorial gives a convolution example, and all the examples fail with
> Python exceptions when given inputs that contain NA values.
> Before any Cython type annotation is introduced, the code functions just
> as equivalent Python would in the interpreter.
> When the type information is introduced, it is done via numpy.pxd which
> defines a mapping between an ndarray declaration and PyArrayObject \*.
> Under the hood, this maps to __Pyx_ArgTypeTest, which does a direct
> comparison of Py_TYPE(obj) against the PyTypeObject for the ndarray.
> Then the code does some dtype comparisons, and uses regular python indexing
> to access the array elements. This python indexing still goes through the
> Python API, so the NA handling and error checking in numpy still can work
> like normal and fail if the inputs have NAs which cannot fit in the output
> array. In this case it fails when trying to convert the NA into an integer
> to set in in the output.
> The next version of the code introduces more efficient indexing. This
> operates based on Python's buffer protocol. This causes Cython to call
> __Pyx_GetBufferAndValidate, which calls __Pyx_GetBuffer, which calls
> PyObject_GetBuffer. This call gives numpy the opportunity to raise an
> exception if the inputs are arrays with NA-masks, something not supported
> by the Python buffer protocol.
> Numerical Python - JPL website
> This document is from 2001, so does not reflect recent numpy, but it is the
> second hit when searching for "numpy c api example" on google.
> There first example, heading "A simple example", is in fact already
> invalid for
> recent numpy even without the NA support. In particular, if the data is
> or in a different byteorder, it may crash or produce incorrect results.
> The next thing the document does is introduce
> PyArray_ContiguousFromObject, which
> gives numpy an opportunity to raise an exception when NA-masked arrays
> are used,
> so the later code will raise exceptions as desired.
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
More information about the NumPy-Discussion