[Numpy-discussion] Attribute hiding APIs for PyArrayObject

Wed Oct 31 20:04:16 EDT 2018

On Wed, Oct 31, 2018 at 4:01 PM Charles R Harris <charlesr.harris at gmail.com>
wrote:

>
>
> On Wed, Oct 31, 2018 at 3:59 PM Allan Haldane <allanhaldane at gmail.com>
> wrote:
>
>> On 10/30/18 5:04 AM, Matti Picus wrote:
>> > TL;DR - should we revert the attribute-hiding constructs in
>> > ndarraytypes.h and unify PyArrayObject_fields with PyArrayObject?
>> >
>> >
>> > Background
>> >
>> >
>> > NumPy 1.8 deprecated direct access to PyArrayObject fields. It made
>> > PyArrayObject "opaque", and hid the fields behind a PyArrayObject_fields
>> > structure
>> >
>> https://github.com/numpy/numpy/blob/v1.15.3/numpy/core/include/numpy/ndarraytypes.h#L659
>> > with a comment about moving this to a private header. In order to access
>> > the fields, users are supposed to use PyArray_FIELDNAME functions, like
>> > PyArray_DATA and PyArray_NDIM. It seems there were thoughts at the time
>> > that numpy might move away from a C-struct based
>> >
>> > underlying data structure. Other changes were also made to enum names,
>> > but those are relatively painless to find-and-replace.
>> >
>> >
>> > NumPy has a mechanism to manage deprecating APIs, C users define
>> > NPY_NO_DEPRICATED_API to a desired level, say NPY_1_8_API_VERSION, and
>> > can then access the API "as if" they were using NumPy 1.8. Users who do
>> > not define NPY_NO_DEPRICATED_API get a warning when compiling, and
>> > default to the pre-1.8 API (aliasing of PyArrayObject to
>> > PyArrayObject_fields and direct access to the C struct fields). This is
>> > convenient for downstream users, both since the new API does not provide
>> > much added value, and it is much easier to write a->nd than
>> > PyArray_NDIM(a). For instance, pandas uses direct assignment to the data
>> > field for fast json parsing
>> >
>> https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/src/ujson/python/JSONtoObj.c#L203
>> > via chunks. Working around the new API in pandas would require more
>> > engineering. Also, for example, cython has a mechanism to transpile
>> > python code into C, mapping slow python attribute lookup to fast C
>> > struct field access
>> >
>> https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#external-extension-types
>> >
>> >
>> >
>> > In a parallel but not really related universe, cython recently upgraded
>> > the object mapping so that we can quiet the annoying "size changed"
>> > runtime warning https://github.com/numpy/numpy/issues/11788 without
>> > requiring warning filters, but that requires updating the numpy.pxd file
>> > provided with cython, and it was proposed that NumPy actually vendor its
>> > own file rather than depending on the cython one
>> > (https://github.com/numpy/numpy/issues/11803).
>> >
>> >
>> > The problem
>> >
>> >
>> > We have now made further changes to our API. In NumPy 1.14 we changed
>> > UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we would like to deprecate
>> > PyArray_SetNumericOps and PyArray_GetNumericOps. The strange warning
>> > when NPY_NO_DEPRICATED_API is annoying. The new API cannot be supported
>> > by cython without some deep surgery
>> > (https://github.com/cython/cython/pull/2640). When I tried dogfooding
>> an
>> > updated numpy.pxd for the only cython code in NumPy, mtrand.pxy, I came
>> > across some of these issues (https://github.com/numpy/numpy/pull/12284
>> ).
>> > Forcing the new API will require downstream users to refactor code or
>> > re-engineer constructs, as in the pandas example above.
>>
>> I haven't understood the cython issue, but just want to mention that for
>> optimization purposes it's nice to be able to modify the fields, like in
>> the pandas/json example above.
>>
>> In particular, PyArray_ConcatenateArrays uses some tricks which
>> temporarily clobber the data pointer and shape of an array to
>> concatenate arrays efficiently. It seems fairly safe to me. These tricks
>> would be nice to re-use in a C port of the new block code we merged
>> recently.
>>
>> Those optimizations aren't possible if only using PyArray_Object.
>>
>>
> It's OK for numpy internals to directly access the structures, as
> presumably they will be updated if anything changes. Maybe it would be
> useful for Cython to have a flag like Py_LIMITED_API?
>

That probably only makes sense if we enable such a flag by default  - which
is a big backwards compat break that users can then undo by setting
Py_LIMITED_API=0. Otherwise the vast majority of users will never use it,
and hence we still cannot change in the C API without breaking the world.
Such breakage would be fine for conda, because it special-cases NumPy in
the same way as Python. For wheel/pip users however, it would cause major
issues.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20181031/eea54ef7/attachment-0001.html>