[Numpy-discussion] Attribute hiding APIs for PyArrayObject

Tue Oct 30 05:04:04 EDT 2018

TL;DR - should we revert the attribute-hiding constructs in 
ndarraytypes.h and unify PyArrayObject_fields with PyArrayObject?

Background

NumPy 1.8 deprecated direct access to PyArrayObject fields. It made 
PyArrayObject "opaque", and hid the fields behind a PyArrayObject_fields 
structure 
https://github.com/numpy/numpy/blob/v1.15.3/numpy/core/include/numpy/ndarraytypes.h#L659 
with a comment about moving this to a private header. In order to access 
the fields, users are supposed to use PyArray_FIELDNAME functions, like 
PyArray_DATA and PyArray_NDIM. It seems there were thoughts at the time 
that numpy might move away from a C-struct based

underlying data structure. Other changes were also made to enum names, 
but those are relatively painless to find-and-replace.

NumPy has a mechanism to manage deprecating APIs, C users define 
NPY_NO_DEPRICATED_API to a desired level, say NPY_1_8_API_VERSION, and 
can then access the API "as if" they were using NumPy 1.8. Users who do 
not define NPY_NO_DEPRICATED_API get a warning when compiling, and 
default to the pre-1.8 API (aliasing of PyArrayObject to 
PyArrayObject_fields and direct access to the C struct fields). This is 
convenient for downstream users, both since the new API does not provide 
much added value, and it is much easier to write a->nd than 
PyArray_NDIM(a). For instance, pandas uses direct assignment to the data 
field for fast json parsing 
https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/src/ujson/python/JSONtoObj.c#L203 
via chunks. Working around the new API in pandas would require more 
engineering. Also, for example, cython has a mechanism to transpile 
python code into C, mapping slow python attribute lookup to fast C 
struct field access 
https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#external-extension-types

In a parallel but not really related universe, cython recently upgraded 
the object mapping so that we can quiet the annoying "size changed" 
runtime warning https://github.com/numpy/numpy/issues/11788 without 
requiring warning filters, but that requires updating the numpy.pxd file 
provided with cython, and it was proposed that NumPy actually vendor its 
own file rather than depending on the cython one 
(https://github.com/numpy/numpy/issues/11803).

The problem

We have now made further changes to our API. In NumPy 1.14 we changed 
UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we would like to deprecate 
PyArray_SetNumericOps and PyArray_GetNumericOps. The strange warning 
when NPY_NO_DEPRICATED_API is annoying. The new API cannot be supported 
by cython without some deep surgery 
(https://github.com/cython/cython/pull/2640). When I tried dogfooding an 
updated numpy.pxd for the only cython code in NumPy, mtrand.pxy, I came 
across some of these issues (https://github.com/numpy/numpy/pull/12284). 
Forcing the new API will require downstream users to refactor code or 
re-engineer constructs, as in the pandas example above.

The question

Is the attribute-hiding effort worth it? Should we give up, revert the 
PyArrayObject/PyArrayObject_fields division and allow direct access from 
C to the numpy internals? Is there another path forward that is less 
painful?

Matti