[Numpy-discussion] Attribute hiding APIs for PyArrayObject

Nathaniel Smith njs at pobox.com
Tue Oct 30 22:33:37 EDT 2018


It's probably helpful to know that Py_LIMITED_API is a
kinda-experimental thing that was added in CPython 3.2 (see PEP 384)
and remains almost 100% unused. It has never been a popular or
influential thing (for better or worse).

-n

On Tue, Oct 30, 2018 at 6:41 PM, Eric Wieser
<wieser.eric+numpy at gmail.com> wrote:
> In NumPy 1.14 we changed UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we
> would like to deprecate PyArray_SetNumericOps and PyArray_GetNumericOps. The
> strange warning when NPY_NO_DEPRICATED_API is annoying
>
> I’m not sure I make the connection here between hidden fields and API
> deprecation. You seem to be asking two vaguely related questions:
>
> Should we have deprecated field access in the first place
> Does our api deprecation mechanism need work
>
> I think a more substantial problem statement is needed for 2, so I’m only
> going to respond to 1 here.
>
> Hiding fields seems to me to match the CPython model of things, where your
> public api is PyArray<thing>_SomeGetter(thing).
> If you look at the cpython source code, they only expose the underlying
> struct fields if you don’t define Py_LIMITED_API, ie if you as a consumer
> volunteer to be broken by upstream changes in minor versions. People (like
> us) are willing to produce separate builds for each python versions, so
> often do not define this.
>
> We could add a similar PyArray_LIMITED_API that allows field access under a
> similar guarantee - the question is, are many downstream consumers willing
> to produce builds against multiple numpy versions? (especially if they also
> do so against multiple python versions)
>
> Also, for example, cython has a mechanism to transpile python code into C,
> mapping slow python attribute lookup to fast C struct field access
>
> How does this work for builtin types? Does cython deliberately not define
> Py_LIMITED_API? Or are you just forced to use PyTuple_GetItem(t) if you want
> the fast path.
>
> Eric
>
> On Tue, 30 Oct 2018 at 02:04 Matti Picus <matti.picus at gmail.com> wrote:
>>
>> TL;DR - should we revert the attribute-hiding constructs in
>> ndarraytypes.h and unify PyArrayObject_fields with PyArrayObject?
>>
>>
>> Background
>>
>>
>> NumPy 1.8 deprecated direct access to PyArrayObject fields. It made
>> PyArrayObject "opaque", and hid the fields behind a PyArrayObject_fields
>> structure
>>
>> https://github.com/numpy/numpy/blob/v1.15.3/numpy/core/include/numpy/ndarraytypes.h#L659
>> with a comment about moving this to a private header. In order to access
>> the fields, users are supposed to use PyArray_FIELDNAME functions, like
>> PyArray_DATA and PyArray_NDIM. It seems there were thoughts at the time
>> that numpy might move away from a C-struct based
>>
>> underlying data structure. Other changes were also made to enum names,
>> but those are relatively painless to find-and-replace.
>>
>>
>> NumPy has a mechanism to manage deprecating APIs, C users define
>> NPY_NO_DEPRICATED_API to a desired level, say NPY_1_8_API_VERSION, and
>> can then access the API "as if" they were using NumPy 1.8. Users who do
>> not define NPY_NO_DEPRICATED_API get a warning when compiling, and
>> default to the pre-1.8 API (aliasing of PyArrayObject to
>> PyArrayObject_fields and direct access to the C struct fields). This is
>> convenient for downstream users, both since the new API does not provide
>> much added value, and it is much easier to write a->nd than
>> PyArray_NDIM(a). For instance, pandas uses direct assignment to the data
>> field for fast json parsing
>>
>> https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/src/ujson/python/JSONtoObj.c#L203
>> via chunks. Working around the new API in pandas would require more
>> engineering. Also, for example, cython has a mechanism to transpile
>> python code into C, mapping slow python attribute lookup to fast C
>> struct field access
>>
>> https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#external-extension-types
>>
>>
>> In a parallel but not really related universe, cython recently upgraded
>> the object mapping so that we can quiet the annoying "size changed"
>> runtime warning https://github.com/numpy/numpy/issues/11788 without
>> requiring warning filters, but that requires updating the numpy.pxd file
>> provided with cython, and it was proposed that NumPy actually vendor its
>> own file rather than depending on the cython one
>> (https://github.com/numpy/numpy/issues/11803).
>>
>>
>> The problem
>>
>>
>> We have now made further changes to our API. In NumPy 1.14 we changed
>> UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we would like to deprecate
>> PyArray_SetNumericOps and PyArray_GetNumericOps. The strange warning
>> when NPY_NO_DEPRICATED_API is annoying. The new API cannot be supported
>> by cython without some deep surgery
>> (https://github.com/cython/cython/pull/2640). When I tried dogfooding an
>> updated numpy.pxd for the only cython code in NumPy, mtrand.pxy, I came
>> across some of these issues (https://github.com/numpy/numpy/pull/12284).
>> Forcing the new API will require downstream users to refactor code or
>> re-engineer constructs, as in the pandas example above.
>>
>>
>> The question
>>
>>
>> Is the attribute-hiding effort worth it? Should we give up, revert the
>> PyArrayObject/PyArrayObject_fields division and allow direct access from
>> C to the numpy internals? Is there another path forward that is less
>> painful?
>>
>>
>> Matti
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>



-- 
Nathaniel J. Smith -- https://vorpus.org


More information about the NumPy-Discussion mailing list