[Numpy-discussion] Attribute hiding APIs for PyArrayObject

Petr Viktorin encukou at gmail.com
Wed Oct 31 04:46:03 EDT 2018


On 10/31/18 03:33, Nathaniel Smith wrote:
> It's probably helpful to know that Py_LIMITED_API is a
> kinda-experimental thing that was added in CPython 3.2 (see PEP 384)
> and remains almost 100% unused. It has never been a popular or
> influential thing (for better or worse).

Py_LIMITED_API is not very influential *outside* CPython, but it's not 
(yet) a failed experiment. (Which is not what you said, but someone 
might read it that way.)

The popularity is a bit of a chicken-and-egg problem. Py_LIMITED_API is 
not used much because the current implementation is not useful in the 
real world. But as large projects like Cython and PySide are looking at 
Py_LIMITED_API from their side, problems are getting found and fixed.
It's not a fast process, being all volunteer-driven. But the limited API 
(= stable ABI) does have a major role in thoughts about future CPython 
API design, and the idea (not current implementation) is worth looking at.

What's the idea? In addition to python35/python36/python37, there's a 
"python3" API that you can target, which is slower at run-time but won't 
inflate your build/test matrix.
It's not either-or. CPython provides both.


> -n
> 
> On Tue, Oct 30, 2018 at 6:41 PM, Eric Wieser
> <wieser.eric+numpy at gmail.com> wrote:
>> In NumPy 1.14 we changed UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we
>> would like to deprecate PyArray_SetNumericOps and PyArray_GetNumericOps. The
>> strange warning when NPY_NO_DEPRICATED_API is annoying
>>
>> I’m not sure I make the connection here between hidden fields and API
>> deprecation. You seem to be asking two vaguely related questions:
>>
>> Should we have deprecated field access in the first place
>> Does our api deprecation mechanism need work
>>
>> I think a more substantial problem statement is needed for 2, so I’m only
>> going to respond to 1 here.
>>
>> Hiding fields seems to me to match the CPython model of things, where your
>> public api is PyArray<thing>_SomeGetter(thing).
>> If you look at the cpython source code, they only expose the underlying
>> struct fields if you don’t define Py_LIMITED_API, ie if you as a consumer
>> volunteer to be broken by upstream changes in minor versions. People (like
>> us) are willing to produce separate builds for each python versions, so
>> often do not define this.
>>
>> We could add a similar PyArray_LIMITED_API that allows field access under a
>> similar guarantee - the question is, are many downstream consumers willing
>> to produce builds against multiple numpy versions? (especially if they also
>> do so against multiple python versions)
>>
>> Also, for example, cython has a mechanism to transpile python code into C,
>> mapping slow python attribute lookup to fast C struct field access
>>
>> How does this work for builtin types? Does cython deliberately not define
>> Py_LIMITED_API? Or are you just forced to use PyTuple_GetItem(t) if you want
>> the fast path.
>>
>> Eric
>>
>> On Tue, 30 Oct 2018 at 02:04 Matti Picus <matti.picus at gmail.com> wrote:
>>>
>>> TL;DR - should we revert the attribute-hiding constructs in
>>> ndarraytypes.h and unify PyArrayObject_fields with PyArrayObject?
>>>
>>>
>>> Background
>>>
>>>
>>> NumPy 1.8 deprecated direct access to PyArrayObject fields. It made
>>> PyArrayObject "opaque", and hid the fields behind a PyArrayObject_fields
>>> structure
>>>
>>> https://github.com/numpy/numpy/blob/v1.15.3/numpy/core/include/numpy/ndarraytypes.h#L659
>>> with a comment about moving this to a private header. In order to access
>>> the fields, users are supposed to use PyArray_FIELDNAME functions, like
>>> PyArray_DATA and PyArray_NDIM. It seems there were thoughts at the time
>>> that numpy might move away from a C-struct based
>>>
>>> underlying data structure. Other changes were also made to enum names,
>>> but those are relatively painless to find-and-replace.
>>>
>>>
>>> NumPy has a mechanism to manage deprecating APIs, C users define
>>> NPY_NO_DEPRICATED_API to a desired level, say NPY_1_8_API_VERSION, and
>>> can then access the API "as if" they were using NumPy 1.8. Users who do
>>> not define NPY_NO_DEPRICATED_API get a warning when compiling, and
>>> default to the pre-1.8 API (aliasing of PyArrayObject to
>>> PyArrayObject_fields and direct access to the C struct fields). This is
>>> convenient for downstream users, both since the new API does not provide
>>> much added value, and it is much easier to write a->nd than
>>> PyArray_NDIM(a). For instance, pandas uses direct assignment to the data
>>> field for fast json parsing
>>>
>>> https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/src/ujson/python/JSONtoObj.c#L203
>>> via chunks. Working around the new API in pandas would require more
>>> engineering. Also, for example, cython has a mechanism to transpile
>>> python code into C, mapping slow python attribute lookup to fast C
>>> struct field access
>>>
>>> https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#external-extension-types
>>>
>>>
>>> In a parallel but not really related universe, cython recently upgraded
>>> the object mapping so that we can quiet the annoying "size changed"
>>> runtime warning https://github.com/numpy/numpy/issues/11788 without
>>> requiring warning filters, but that requires updating the numpy.pxd file
>>> provided with cython, and it was proposed that NumPy actually vendor its
>>> own file rather than depending on the cython one
>>> (https://github.com/numpy/numpy/issues/11803).
>>>
>>>
>>> The problem
>>>
>>>
>>> We have now made further changes to our API. In NumPy 1.14 we changed
>>> UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we would like to deprecate
>>> PyArray_SetNumericOps and PyArray_GetNumericOps. The strange warning
>>> when NPY_NO_DEPRICATED_API is annoying. The new API cannot be supported
>>> by cython without some deep surgery
>>> (https://github.com/cython/cython/pull/2640). When I tried dogfooding an
>>> updated numpy.pxd for the only cython code in NumPy, mtrand.pxy, I came
>>> across some of these issues (https://github.com/numpy/numpy/pull/12284).
>>> Forcing the new API will require downstream users to refactor code or
>>> re-engineer constructs, as in the pandas example above.
>>>
>>>
>>> The question
>>>
>>>
>>> Is the attribute-hiding effort worth it? Should we give up, revert the
>>> PyArrayObject/PyArrayObject_fields division and allow direct access from
>>> C to the numpy internals? Is there another path forward that is less
>>> painful?
>>>
>>>
>>> Matti
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> 
> 
> 


More information about the NumPy-Discussion mailing list