[Numpy-discussion] NA-mask interactions with existing C code

Mark Wiebe mwwiebe at gmail.com
Thu May 10 18:28:35 EDT 2012


I did some searching for typical Cython and C code which accesses numpy
arrays, and added a section to the NEP describing how they behave in the
current implementation. Cython code which uses either straight Python
access or the buffer protocol is fine (after a bugfix in numpy, it wasn't
failing currently as it should in the pep3118 case). C code which follows
the recommended practice of using PyArray_FromAny or one of the related
macros is also fine, because these functions have been made to fail on
NA-masked arrays unless the flag NPY_ARRAY_ALLOWNA is provided.

In general, code which follows the recommended numpy practices will raise
exceptions when encountering NA-masked arrays. This means programmers don't
have to worry about the NA unless they want to support it. Having things go
through PyArray_FromAny also provides a place where lazy evaluation arrays
could be evaluated, and other similar potential future extensions can use
to provide compatibility.

Here's the section I added to the NEP:

Interaction With Pre-existing C API Usage
=========================================

Making sure existing code using the C API, whether it's written in C, C++,
or Cython, does something reasonable is an important goal of this
implementation.
The general strategy is to make existing code which does not explicitly
tell numpy it supports NA masks fail with an exception saying so. There are
a few different access patterns people use to get ahold of the numpy array
data,
here we examine a few of them to see what numpy can do. These examples are
found from doing google searches of numpy C API array access.

Numpy Documentation - How to extend NumPy
-----------------------------------------

http://docs.scipy.org/doc/numpy/user/c-info.how-to-extend.html#dealing-with-array-objects

This page has a section "Dealing with array objects" which has some advice
for how
to access numpy arrays from C. When accepting arrays, the first step it
suggests is
to use PyArray_FromAny or a macro built on that function, so code following
this
advice will properly fail when given an NA-masked array it doesn't know how
to handle.

The way this is handled is that PyArray_FromAny requires a special flag,
NPY_ARRAY_ALLOWNA,
before it will allow NA-masked arrays to flow through.

http://docs.scipy.org/doc/numpy/reference/c-api.array.html#NPY_ARRAY_ALLOWNA

Code which does not follow this advice, and instead just calls
PyArray_Check() to verify
its an ndarray and checks some flags, will silently produce incorrect
results. This style
of code does not provide any opportunity for numpy to say "hey, this array
is special",
so also is not compatible with future ideas of lazy evaluation, derived
dtypes, etc.

Tutorial From Cython Website
----------------------------

http://docs.cython.org/src/tutorial/numpy.html

This tutorial gives a convolution example, and all the examples fail with
Python exceptions when given inputs that contain NA values.

Before any Cython type annotation is introduced, the code functions just
as equivalent Python would in the interpreter.

When the type information is introduced, it is done via numpy.pxd which
defines a mapping between an ndarray declaration and PyArrayObject \*.
Under the hood, this maps to __Pyx_ArgTypeTest, which does a direct
comparison of Py_TYPE(obj) against the PyTypeObject for the ndarray.

Then the code does some dtype comparisons, and uses regular python indexing
to access the array elements. This python indexing still goes through the
Python API, so the NA handling and error checking in numpy still can work
like normal and fail if the inputs have NAs which cannot fit in the output
array. In this case it fails when trying to convert the NA into an integer
to set in in the output.

The next version of the code introduces more efficient indexing. This
operates based on Python's buffer protocol. This causes Cython to call
__Pyx_GetBufferAndValidate, which calls __Pyx_GetBuffer, which calls
PyObject_GetBuffer. This call gives numpy the opportunity to raise an
exception if the inputs are arrays with NA-masks, something not supported
by the Python buffer protocol.

Numerical Python - JPL website
------------------------------

http://dsnra.jpl.nasa.gov/software/Python/numpydoc/numpy-13.html

This document is from 2001, so does not reflect recent numpy, but it is the
second hit when searching for "numpy c api example" on google.

There first example, heading "A simple example", is in fact already invalid
for
recent numpy even without the NA support. In particular, if the data is
misaligned
or in a different byteorder, it may crash or produce incorrect results.

The next thing the document does is introduce PyArray_ContiguousFromObject,
which
gives numpy an opportunity to raise an exception when NA-masked arrays are
used,
so the later code will raise exceptions as desired.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120510/1b89ade5/attachment.html>


More information about the NumPy-Discussion mailing list