[Numpy-discussion] Attribute hiding APIs for PyArrayObject

Eric Wieser wieser.eric+numpy at gmail.com
Tue Oct 30 21:41:51 EDT 2018


In NumPy 1.14 we changed UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we
would like to deprecate PyArray_SetNumericOps and PyArray_GetNumericOps.
The strange warning when NPY_NO_DEPRICATED_API is annoying

I’m not sure I make the connection here between hidden fields and API
deprecation. You seem to be asking two vaguely related questions:

   1. Should we have deprecated field access in the first place
   2. Does our api deprecation mechanism need work

I think a more substantial problem statement is needed for 2, so I’m only
going to respond to 1 here.

Hiding fields seems to me to match the CPython model of things, where your
public api is PyArray<thing>_SomeGetter(thing).
If you look at the cpython source code
<https://github.com/python/cpython/blob/e0720cd/Include/tupleobject.h#L24-L34>,
they only expose the underlying struct fields if you don’t define
Py_LIMITED_API, ie if you as a consumer volunteer to be broken by upstream
changes in minor versions. People (like us) are willing to produce separate
builds for each python versions, so often do not define this.

We could add a similar PyArray_LIMITED_API that allows field access under a
similar guarantee - the question is, are many downstream consumers willing
to produce builds against multiple numpy versions? (especially if they also
do so against multiple python versions)

Also, for example, cython has a mechanism to transpile python code into C,
mapping slow python attribute lookup to fast C struct field access

How does this work for builtin types? Does cython deliberately not define
Py_LIMITED_API? Or are you just forced to use PyTuple_GetItem(t) if you
want the fast path.

Eric

On Tue, 30 Oct 2018 at 02:04 Matti Picus <matti.picus at gmail.com> wrote:

TL;DR - should we revert the attribute-hiding constructs in
> ndarraytypes.h and unify PyArrayObject_fields with PyArrayObject?
>
>
> Background
>
>
> NumPy 1.8 deprecated direct access to PyArrayObject fields. It made
> PyArrayObject "opaque", and hid the fields behind a PyArrayObject_fields
> structure
>
> https://github.com/numpy/numpy/blob/v1.15.3/numpy/core/include/numpy/ndarraytypes.h#L659
> with a comment about moving this to a private header. In order to access
> the fields, users are supposed to use PyArray_FIELDNAME functions, like
> PyArray_DATA and PyArray_NDIM. It seems there were thoughts at the time
> that numpy might move away from a C-struct based
>
> underlying data structure. Other changes were also made to enum names,
> but those are relatively painless to find-and-replace.
>
>
> NumPy has a mechanism to manage deprecating APIs, C users define
> NPY_NO_DEPRICATED_API to a desired level, say NPY_1_8_API_VERSION, and
> can then access the API "as if" they were using NumPy 1.8. Users who do
> not define NPY_NO_DEPRICATED_API get a warning when compiling, and
> default to the pre-1.8 API (aliasing of PyArrayObject to
> PyArrayObject_fields and direct access to the C struct fields). This is
> convenient for downstream users, both since the new API does not provide
> much added value, and it is much easier to write a->nd than
> PyArray_NDIM(a). For instance, pandas uses direct assignment to the data
> field for fast json parsing
>
> https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/src/ujson/python/JSONtoObj.c#L203
> via chunks. Working around the new API in pandas would require more
> engineering. Also, for example, cython has a mechanism to transpile
> python code into C, mapping slow python attribute lookup to fast C
> struct field access
>
> https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#external-extension-types
>
>
> In a parallel but not really related universe, cython recently upgraded
> the object mapping so that we can quiet the annoying "size changed"
> runtime warning https://github.com/numpy/numpy/issues/11788 without
> requiring warning filters, but that requires updating the numpy.pxd file
> provided with cython, and it was proposed that NumPy actually vendor its
> own file rather than depending on the cython one
> (https://github.com/numpy/numpy/issues/11803).
>
>
> The problem
>
>
> We have now made further changes to our API. In NumPy 1.14 we changed
> UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we would like to deprecate
> PyArray_SetNumericOps and PyArray_GetNumericOps. The strange warning
> when NPY_NO_DEPRICATED_API is annoying. The new API cannot be supported
> by cython without some deep surgery
> (https://github.com/cython/cython/pull/2640). When I tried dogfooding an
> updated numpy.pxd for the only cython code in NumPy, mtrand.pxy, I came
> across some of these issues (https://github.com/numpy/numpy/pull/12284).
> Forcing the new API will require downstream users to refactor code or
> re-engineer constructs, as in the pandas example above.
>
>
> The question
>
>
> Is the attribute-hiding effort worth it? Should we give up, revert the
> PyArrayObject/PyArrayObject_fields division and allow direct access from
> C to the numpy internals? Is there another path forward that is less
> painful?
>
>
> Matti
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
​
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20181030/2c0521a8/attachment-0001.html>


More information about the NumPy-Discussion mailing list