Attribute hiding APIs for PyArrayObject
TL;DR - should we revert the attribute-hiding constructs in ndarraytypes.h and unify PyArrayObject_fields with PyArrayObject? Background NumPy 1.8 deprecated direct access to PyArrayObject fields. It made PyArrayObject "opaque", and hid the fields behind a PyArrayObject_fields structure https://github.com/numpy/numpy/blob/v1.15.3/numpy/core/include/numpy/ndarray... with a comment about moving this to a private header. In order to access the fields, users are supposed to use PyArray_FIELDNAME functions, like PyArray_DATA and PyArray_NDIM. It seems there were thoughts at the time that numpy might move away from a C-struct based underlying data structure. Other changes were also made to enum names, but those are relatively painless to find-and-replace. NumPy has a mechanism to manage deprecating APIs, C users define NPY_NO_DEPRICATED_API to a desired level, say NPY_1_8_API_VERSION, and can then access the API "as if" they were using NumPy 1.8. Users who do not define NPY_NO_DEPRICATED_API get a warning when compiling, and default to the pre-1.8 API (aliasing of PyArrayObject to PyArrayObject_fields and direct access to the C struct fields). This is convenient for downstream users, both since the new API does not provide much added value, and it is much easier to write a->nd than PyArray_NDIM(a). For instance, pandas uses direct assignment to the data field for fast json parsing https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/src/ujson/pyth... via chunks. Working around the new API in pandas would require more engineering. Also, for example, cython has a mechanism to transpile python code into C, mapping slow python attribute lookup to fast C struct field access https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#e... In a parallel but not really related universe, cython recently upgraded the object mapping so that we can quiet the annoying "size changed" runtime warning https://github.com/numpy/numpy/issues/11788 without requiring warning filters, but that requires updating the numpy.pxd file provided with cython, and it was proposed that NumPy actually vendor its own file rather than depending on the cython one (https://github.com/numpy/numpy/issues/11803). The problem We have now made further changes to our API. In NumPy 1.14 we changed UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we would like to deprecate PyArray_SetNumericOps and PyArray_GetNumericOps. The strange warning when NPY_NO_DEPRICATED_API is annoying. The new API cannot be supported by cython without some deep surgery (https://github.com/cython/cython/pull/2640). When I tried dogfooding an updated numpy.pxd for the only cython code in NumPy, mtrand.pxy, I came across some of these issues (https://github.com/numpy/numpy/pull/12284). Forcing the new API will require downstream users to refactor code or re-engineer constructs, as in the pandas example above. The question Is the attribute-hiding effort worth it? Should we give up, revert the PyArrayObject/PyArrayObject_fields division and allow direct access from C to the numpy internals? Is there another path forward that is less painful? Matti
In NumPy 1.14 we changed UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we would like to deprecate PyArray_SetNumericOps and PyArray_GetNumericOps. The strange warning when NPY_NO_DEPRICATED_API is annoying I’m not sure I make the connection here between hidden fields and API deprecation. You seem to be asking two vaguely related questions: 1. Should we have deprecated field access in the first place 2. Does our api deprecation mechanism need work I think a more substantial problem statement is needed for 2, so I’m only going to respond to 1 here. Hiding fields seems to me to match the CPython model of things, where your public api is PyArray<thing>_SomeGetter(thing). If you look at the cpython source code <https://github.com/python/cpython/blob/e0720cd/Include/tupleobject.h#L24-L34>, they only expose the underlying struct fields if you don’t define Py_LIMITED_API, ie if you as a consumer volunteer to be broken by upstream changes in minor versions. People (like us) are willing to produce separate builds for each python versions, so often do not define this. We could add a similar PyArray_LIMITED_API that allows field access under a similar guarantee - the question is, are many downstream consumers willing to produce builds against multiple numpy versions? (especially if they also do so against multiple python versions) Also, for example, cython has a mechanism to transpile python code into C, mapping slow python attribute lookup to fast C struct field access How does this work for builtin types? Does cython deliberately not define Py_LIMITED_API? Or are you just forced to use PyTuple_GetItem(t) if you want the fast path. Eric On Tue, 30 Oct 2018 at 02:04 Matti Picus <matti.picus@gmail.com> wrote: TL;DR - should we revert the attribute-hiding constructs in
ndarraytypes.h and unify PyArrayObject_fields with PyArrayObject?
Background
NumPy 1.8 deprecated direct access to PyArrayObject fields. It made PyArrayObject "opaque", and hid the fields behind a PyArrayObject_fields structure
https://github.com/numpy/numpy/blob/v1.15.3/numpy/core/include/numpy/ndarray... with a comment about moving this to a private header. In order to access the fields, users are supposed to use PyArray_FIELDNAME functions, like PyArray_DATA and PyArray_NDIM. It seems there were thoughts at the time that numpy might move away from a C-struct based
underlying data structure. Other changes were also made to enum names, but those are relatively painless to find-and-replace.
NumPy has a mechanism to manage deprecating APIs, C users define NPY_NO_DEPRICATED_API to a desired level, say NPY_1_8_API_VERSION, and can then access the API "as if" they were using NumPy 1.8. Users who do not define NPY_NO_DEPRICATED_API get a warning when compiling, and default to the pre-1.8 API (aliasing of PyArrayObject to PyArrayObject_fields and direct access to the C struct fields). This is convenient for downstream users, both since the new API does not provide much added value, and it is much easier to write a->nd than PyArray_NDIM(a). For instance, pandas uses direct assignment to the data field for fast json parsing
https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/src/ujson/pyth... via chunks. Working around the new API in pandas would require more engineering. Also, for example, cython has a mechanism to transpile python code into C, mapping slow python attribute lookup to fast C struct field access
https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#e...
In a parallel but not really related universe, cython recently upgraded the object mapping so that we can quiet the annoying "size changed" runtime warning https://github.com/numpy/numpy/issues/11788 without requiring warning filters, but that requires updating the numpy.pxd file provided with cython, and it was proposed that NumPy actually vendor its own file rather than depending on the cython one (https://github.com/numpy/numpy/issues/11803).
The problem
We have now made further changes to our API. In NumPy 1.14 we changed UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we would like to deprecate PyArray_SetNumericOps and PyArray_GetNumericOps. The strange warning when NPY_NO_DEPRICATED_API is annoying. The new API cannot be supported by cython without some deep surgery (https://github.com/cython/cython/pull/2640). When I tried dogfooding an updated numpy.pxd for the only cython code in NumPy, mtrand.pxy, I came across some of these issues (https://github.com/numpy/numpy/pull/12284). Forcing the new API will require downstream users to refactor code or re-engineer constructs, as in the pandas example above.
The question
Is the attribute-hiding effort worth it? Should we give up, revert the PyArrayObject/PyArrayObject_fields division and allow direct access from C to the numpy internals? Is there another path forward that is less painful?
Matti
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
It's probably helpful to know that Py_LIMITED_API is a kinda-experimental thing that was added in CPython 3.2 (see PEP 384) and remains almost 100% unused. It has never been a popular or influential thing (for better or worse). -n On Tue, Oct 30, 2018 at 6:41 PM, Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
In NumPy 1.14 we changed UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we would like to deprecate PyArray_SetNumericOps and PyArray_GetNumericOps. The strange warning when NPY_NO_DEPRICATED_API is annoying
I’m not sure I make the connection here between hidden fields and API deprecation. You seem to be asking two vaguely related questions:
Should we have deprecated field access in the first place Does our api deprecation mechanism need work
I think a more substantial problem statement is needed for 2, so I’m only going to respond to 1 here.
Hiding fields seems to me to match the CPython model of things, where your public api is PyArray<thing>_SomeGetter(thing). If you look at the cpython source code, they only expose the underlying struct fields if you don’t define Py_LIMITED_API, ie if you as a consumer volunteer to be broken by upstream changes in minor versions. People (like us) are willing to produce separate builds for each python versions, so often do not define this.
We could add a similar PyArray_LIMITED_API that allows field access under a similar guarantee - the question is, are many downstream consumers willing to produce builds against multiple numpy versions? (especially if they also do so against multiple python versions)
Also, for example, cython has a mechanism to transpile python code into C, mapping slow python attribute lookup to fast C struct field access
How does this work for builtin types? Does cython deliberately not define Py_LIMITED_API? Or are you just forced to use PyTuple_GetItem(t) if you want the fast path.
Eric
On Tue, 30 Oct 2018 at 02:04 Matti Picus <matti.picus@gmail.com> wrote:
TL;DR - should we revert the attribute-hiding constructs in ndarraytypes.h and unify PyArrayObject_fields with PyArrayObject?
Background
NumPy 1.8 deprecated direct access to PyArrayObject fields. It made PyArrayObject "opaque", and hid the fields behind a PyArrayObject_fields structure
https://github.com/numpy/numpy/blob/v1.15.3/numpy/core/include/numpy/ndarray... with a comment about moving this to a private header. In order to access the fields, users are supposed to use PyArray_FIELDNAME functions, like PyArray_DATA and PyArray_NDIM. It seems there were thoughts at the time that numpy might move away from a C-struct based
underlying data structure. Other changes were also made to enum names, but those are relatively painless to find-and-replace.
NumPy has a mechanism to manage deprecating APIs, C users define NPY_NO_DEPRICATED_API to a desired level, say NPY_1_8_API_VERSION, and can then access the API "as if" they were using NumPy 1.8. Users who do not define NPY_NO_DEPRICATED_API get a warning when compiling, and default to the pre-1.8 API (aliasing of PyArrayObject to PyArrayObject_fields and direct access to the C struct fields). This is convenient for downstream users, both since the new API does not provide much added value, and it is much easier to write a->nd than PyArray_NDIM(a). For instance, pandas uses direct assignment to the data field for fast json parsing
https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/src/ujson/pyth... via chunks. Working around the new API in pandas would require more engineering. Also, for example, cython has a mechanism to transpile python code into C, mapping slow python attribute lookup to fast C struct field access
https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#e...
In a parallel but not really related universe, cython recently upgraded the object mapping so that we can quiet the annoying "size changed" runtime warning https://github.com/numpy/numpy/issues/11788 without requiring warning filters, but that requires updating the numpy.pxd file provided with cython, and it was proposed that NumPy actually vendor its own file rather than depending on the cython one (https://github.com/numpy/numpy/issues/11803).
The problem
We have now made further changes to our API. In NumPy 1.14 we changed UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we would like to deprecate PyArray_SetNumericOps and PyArray_GetNumericOps. The strange warning when NPY_NO_DEPRICATED_API is annoying. The new API cannot be supported by cython without some deep surgery (https://github.com/cython/cython/pull/2640). When I tried dogfooding an updated numpy.pxd for the only cython code in NumPy, mtrand.pxy, I came across some of these issues (https://github.com/numpy/numpy/pull/12284). Forcing the new API will require downstream users to refactor code or re-engineer constructs, as in the pandas example above.
The question
Is the attribute-hiding effort worth it? Should we give up, revert the PyArrayObject/PyArrayObject_fields division and allow direct access from C to the numpy internals? Is there another path forward that is less painful?
Matti
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
-- Nathaniel J. Smith -- https://vorpus.org
On 10/31/18 03:33, Nathaniel Smith wrote:
It's probably helpful to know that Py_LIMITED_API is a kinda-experimental thing that was added in CPython 3.2 (see PEP 384) and remains almost 100% unused. It has never been a popular or influential thing (for better or worse).
Py_LIMITED_API is not very influential *outside* CPython, but it's not (yet) a failed experiment. (Which is not what you said, but someone might read it that way.) The popularity is a bit of a chicken-and-egg problem. Py_LIMITED_API is not used much because the current implementation is not useful in the real world. But as large projects like Cython and PySide are looking at Py_LIMITED_API from their side, problems are getting found and fixed. It's not a fast process, being all volunteer-driven. But the limited API (= stable ABI) does have a major role in thoughts about future CPython API design, and the idea (not current implementation) is worth looking at. What's the idea? In addition to python35/python36/python37, there's a "python3" API that you can target, which is slower at run-time but won't inflate your build/test matrix. It's not either-or. CPython provides both.
-n
On Tue, Oct 30, 2018 at 6:41 PM, Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
In NumPy 1.14 we changed UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we would like to deprecate PyArray_SetNumericOps and PyArray_GetNumericOps. The strange warning when NPY_NO_DEPRICATED_API is annoying
I’m not sure I make the connection here between hidden fields and API deprecation. You seem to be asking two vaguely related questions:
Should we have deprecated field access in the first place Does our api deprecation mechanism need work
I think a more substantial problem statement is needed for 2, so I’m only going to respond to 1 here.
Hiding fields seems to me to match the CPython model of things, where your public api is PyArray<thing>_SomeGetter(thing). If you look at the cpython source code, they only expose the underlying struct fields if you don’t define Py_LIMITED_API, ie if you as a consumer volunteer to be broken by upstream changes in minor versions. People (like us) are willing to produce separate builds for each python versions, so often do not define this.
We could add a similar PyArray_LIMITED_API that allows field access under a similar guarantee - the question is, are many downstream consumers willing to produce builds against multiple numpy versions? (especially if they also do so against multiple python versions)
Also, for example, cython has a mechanism to transpile python code into C, mapping slow python attribute lookup to fast C struct field access
How does this work for builtin types? Does cython deliberately not define Py_LIMITED_API? Or are you just forced to use PyTuple_GetItem(t) if you want the fast path.
Eric
On Tue, 30 Oct 2018 at 02:04 Matti Picus <matti.picus@gmail.com> wrote:
TL;DR - should we revert the attribute-hiding constructs in ndarraytypes.h and unify PyArrayObject_fields with PyArrayObject?
Background
NumPy 1.8 deprecated direct access to PyArrayObject fields. It made PyArrayObject "opaque", and hid the fields behind a PyArrayObject_fields structure
https://github.com/numpy/numpy/blob/v1.15.3/numpy/core/include/numpy/ndarray... with a comment about moving this to a private header. In order to access the fields, users are supposed to use PyArray_FIELDNAME functions, like PyArray_DATA and PyArray_NDIM. It seems there were thoughts at the time that numpy might move away from a C-struct based
underlying data structure. Other changes were also made to enum names, but those are relatively painless to find-and-replace.
NumPy has a mechanism to manage deprecating APIs, C users define NPY_NO_DEPRICATED_API to a desired level, say NPY_1_8_API_VERSION, and can then access the API "as if" they were using NumPy 1.8. Users who do not define NPY_NO_DEPRICATED_API get a warning when compiling, and default to the pre-1.8 API (aliasing of PyArrayObject to PyArrayObject_fields and direct access to the C struct fields). This is convenient for downstream users, both since the new API does not provide much added value, and it is much easier to write a->nd than PyArray_NDIM(a). For instance, pandas uses direct assignment to the data field for fast json parsing
https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/src/ujson/pyth... via chunks. Working around the new API in pandas would require more engineering. Also, for example, cython has a mechanism to transpile python code into C, mapping slow python attribute lookup to fast C struct field access
https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#e...
In a parallel but not really related universe, cython recently upgraded the object mapping so that we can quiet the annoying "size changed" runtime warning https://github.com/numpy/numpy/issues/11788 without requiring warning filters, but that requires updating the numpy.pxd file provided with cython, and it was proposed that NumPy actually vendor its own file rather than depending on the cython one (https://github.com/numpy/numpy/issues/11803).
The problem
We have now made further changes to our API. In NumPy 1.14 we changed UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we would like to deprecate PyArray_SetNumericOps and PyArray_GetNumericOps. The strange warning when NPY_NO_DEPRICATED_API is annoying. The new API cannot be supported by cython without some deep surgery (https://github.com/cython/cython/pull/2640). When I tried dogfooding an updated numpy.pxd for the only cython code in NumPy, mtrand.pxy, I came across some of these issues (https://github.com/numpy/numpy/pull/12284). Forcing the new API will require downstream users to refactor code or re-engineer constructs, as in the pandas example above.
The question
Is the attribute-hiding effort worth it? Should we give up, revert the PyArrayObject/PyArrayObject_fields division and allow direct access from C to the numpy internals? Is there another path forward that is less painful?
Matti
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On 10/30/18 5:04 AM, Matti Picus wrote:
TL;DR - should we revert the attribute-hiding constructs in ndarraytypes.h and unify PyArrayObject_fields with PyArrayObject?
Background
NumPy 1.8 deprecated direct access to PyArrayObject fields. It made PyArrayObject "opaque", and hid the fields behind a PyArrayObject_fields structure https://github.com/numpy/numpy/blob/v1.15.3/numpy/core/include/numpy/ndarray... with a comment about moving this to a private header. In order to access the fields, users are supposed to use PyArray_FIELDNAME functions, like PyArray_DATA and PyArray_NDIM. It seems there were thoughts at the time that numpy might move away from a C-struct based
underlying data structure. Other changes were also made to enum names, but those are relatively painless to find-and-replace.
NumPy has a mechanism to manage deprecating APIs, C users define NPY_NO_DEPRICATED_API to a desired level, say NPY_1_8_API_VERSION, and can then access the API "as if" they were using NumPy 1.8. Users who do not define NPY_NO_DEPRICATED_API get a warning when compiling, and default to the pre-1.8 API (aliasing of PyArrayObject to PyArrayObject_fields and direct access to the C struct fields). This is convenient for downstream users, both since the new API does not provide much added value, and it is much easier to write a->nd than PyArray_NDIM(a). For instance, pandas uses direct assignment to the data field for fast json parsing https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/src/ujson/pyth... via chunks. Working around the new API in pandas would require more engineering. Also, for example, cython has a mechanism to transpile python code into C, mapping slow python attribute lookup to fast C struct field access https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#e...
In a parallel but not really related universe, cython recently upgraded the object mapping so that we can quiet the annoying "size changed" runtime warning https://github.com/numpy/numpy/issues/11788 without requiring warning filters, but that requires updating the numpy.pxd file provided with cython, and it was proposed that NumPy actually vendor its own file rather than depending on the cython one (https://github.com/numpy/numpy/issues/11803).
The problem
We have now made further changes to our API. In NumPy 1.14 we changed UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we would like to deprecate PyArray_SetNumericOps and PyArray_GetNumericOps. The strange warning when NPY_NO_DEPRICATED_API is annoying. The new API cannot be supported by cython without some deep surgery (https://github.com/cython/cython/pull/2640). When I tried dogfooding an updated numpy.pxd for the only cython code in NumPy, mtrand.pxy, I came across some of these issues (https://github.com/numpy/numpy/pull/12284). Forcing the new API will require downstream users to refactor code or re-engineer constructs, as in the pandas example above.
I haven't understood the cython issue, but just want to mention that for optimization purposes it's nice to be able to modify the fields, like in the pandas/json example above. In particular, PyArray_ConcatenateArrays uses some tricks which temporarily clobber the data pointer and shape of an array to concatenate arrays efficiently. It seems fairly safe to me. These tricks would be nice to re-use in a C port of the new block code we merged recently. Those optimizations aren't possible if only using PyArray_Object. Cheers, Allan
The question
Is the attribute-hiding effort worth it? Should we give up, revert the PyArrayObject/PyArrayObject_fields division and allow direct access from C to the numpy internals? Is there another path forward that is less painful?
Matti
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Wed, Oct 31, 2018 at 3:59 PM Allan Haldane <allanhaldane@gmail.com> wrote:
On 10/30/18 5:04 AM, Matti Picus wrote:
TL;DR - should we revert the attribute-hiding constructs in ndarraytypes.h and unify PyArrayObject_fields with PyArrayObject?
Background
NumPy 1.8 deprecated direct access to PyArrayObject fields. It made PyArrayObject "opaque", and hid the fields behind a PyArrayObject_fields structure
https://github.com/numpy/numpy/blob/v1.15.3/numpy/core/include/numpy/ndarray...
with a comment about moving this to a private header. In order to access the fields, users are supposed to use PyArray_FIELDNAME functions, like PyArray_DATA and PyArray_NDIM. It seems there were thoughts at the time that numpy might move away from a C-struct based
underlying data structure. Other changes were also made to enum names, but those are relatively painless to find-and-replace.
NumPy has a mechanism to manage deprecating APIs, C users define NPY_NO_DEPRICATED_API to a desired level, say NPY_1_8_API_VERSION, and can then access the API "as if" they were using NumPy 1.8. Users who do not define NPY_NO_DEPRICATED_API get a warning when compiling, and default to the pre-1.8 API (aliasing of PyArrayObject to PyArrayObject_fields and direct access to the C struct fields). This is convenient for downstream users, both since the new API does not provide much added value, and it is much easier to write a->nd than PyArray_NDIM(a). For instance, pandas uses direct assignment to the data field for fast json parsing
https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/src/ujson/pyth...
via chunks. Working around the new API in pandas would require more engineering. Also, for example, cython has a mechanism to transpile python code into C, mapping slow python attribute lookup to fast C struct field access
https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#e...
In a parallel but not really related universe, cython recently upgraded the object mapping so that we can quiet the annoying "size changed" runtime warning https://github.com/numpy/numpy/issues/11788 without requiring warning filters, but that requires updating the numpy.pxd file provided with cython, and it was proposed that NumPy actually vendor its own file rather than depending on the cython one (https://github.com/numpy/numpy/issues/11803).
The problem
We have now made further changes to our API. In NumPy 1.14 we changed UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we would like to deprecate PyArray_SetNumericOps and PyArray_GetNumericOps. The strange warning when NPY_NO_DEPRICATED_API is annoying. The new API cannot be supported by cython without some deep surgery (https://github.com/cython/cython/pull/2640). When I tried dogfooding an updated numpy.pxd for the only cython code in NumPy, mtrand.pxy, I came across some of these issues (https://github.com/numpy/numpy/pull/12284). Forcing the new API will require downstream users to refactor code or re-engineer constructs, as in the pandas example above.
I haven't understood the cython issue, but just want to mention that for optimization purposes it's nice to be able to modify the fields, like in the pandas/json example above.
In particular, PyArray_ConcatenateArrays uses some tricks which temporarily clobber the data pointer and shape of an array to concatenate arrays efficiently. It seems fairly safe to me. These tricks would be nice to re-use in a C port of the new block code we merged recently.
Those optimizations aren't possible if only using PyArray_Object.
It's OK for numpy internals to directly access the structures, as presumably they will be updated if anything changes. Maybe it would be useful for Cython to have a flag like Py_LIMITED_API? Chuck
On Wed, Oct 31, 2018 at 4:01 PM Charles R Harris <charlesr.harris@gmail.com> wrote:
On Wed, Oct 31, 2018 at 3:59 PM Allan Haldane <allanhaldane@gmail.com> wrote:
On 10/30/18 5:04 AM, Matti Picus wrote:
TL;DR - should we revert the attribute-hiding constructs in ndarraytypes.h and unify PyArrayObject_fields with PyArrayObject?
Background
NumPy 1.8 deprecated direct access to PyArrayObject fields. It made PyArrayObject "opaque", and hid the fields behind a PyArrayObject_fields structure
https://github.com/numpy/numpy/blob/v1.15.3/numpy/core/include/numpy/ndarray...
with a comment about moving this to a private header. In order to access the fields, users are supposed to use PyArray_FIELDNAME functions, like PyArray_DATA and PyArray_NDIM. It seems there were thoughts at the time that numpy might move away from a C-struct based
underlying data structure. Other changes were also made to enum names, but those are relatively painless to find-and-replace.
NumPy has a mechanism to manage deprecating APIs, C users define NPY_NO_DEPRICATED_API to a desired level, say NPY_1_8_API_VERSION, and can then access the API "as if" they were using NumPy 1.8. Users who do not define NPY_NO_DEPRICATED_API get a warning when compiling, and default to the pre-1.8 API (aliasing of PyArrayObject to PyArrayObject_fields and direct access to the C struct fields). This is convenient for downstream users, both since the new API does not provide much added value, and it is much easier to write a->nd than PyArray_NDIM(a). For instance, pandas uses direct assignment to the data field for fast json parsing
https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/src/ujson/pyth...
via chunks. Working around the new API in pandas would require more engineering. Also, for example, cython has a mechanism to transpile python code into C, mapping slow python attribute lookup to fast C struct field access
https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#e...
In a parallel but not really related universe, cython recently upgraded the object mapping so that we can quiet the annoying "size changed" runtime warning https://github.com/numpy/numpy/issues/11788 without requiring warning filters, but that requires updating the numpy.pxd file provided with cython, and it was proposed that NumPy actually vendor its own file rather than depending on the cython one (https://github.com/numpy/numpy/issues/11803).
The problem
We have now made further changes to our API. In NumPy 1.14 we changed UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we would like to deprecate PyArray_SetNumericOps and PyArray_GetNumericOps. The strange warning when NPY_NO_DEPRICATED_API is annoying. The new API cannot be supported by cython without some deep surgery (https://github.com/cython/cython/pull/2640). When I tried dogfooding
an
updated numpy.pxd for the only cython code in NumPy, mtrand.pxy, I came across some of these issues (https://github.com/numpy/numpy/pull/12284 ). Forcing the new API will require downstream users to refactor code or re-engineer constructs, as in the pandas example above.
I haven't understood the cython issue, but just want to mention that for optimization purposes it's nice to be able to modify the fields, like in the pandas/json example above.
In particular, PyArray_ConcatenateArrays uses some tricks which temporarily clobber the data pointer and shape of an array to concatenate arrays efficiently. It seems fairly safe to me. These tricks would be nice to re-use in a C port of the new block code we merged recently.
Those optimizations aren't possible if only using PyArray_Object.
It's OK for numpy internals to directly access the structures, as presumably they will be updated if anything changes. Maybe it would be useful for Cython to have a flag like Py_LIMITED_API?
That probably only makes sense if we enable such a flag by default - which is a big backwards compat break that users can then undo by setting Py_LIMITED_API=0. Otherwise the vast majority of users will never use it, and hence we still cannot change in the C API without breaking the world. Such breakage would be fine for conda, because it special-cases NumPy in the same way as Python. For wheel/pip users however, it would cause major issues. Ralf
participants (7)
-
Allan Haldane
-
Charles R Harris
-
Eric Wieser
-
Matti Picus
-
Nathaniel Smith
-
Petr Viktorin
-
Ralf Gommers