Enhancing "ctyepdef class numpy.ndarray" with getter properties
To solve issue #2498, I did some experiments https://github.com/cython/cython/issues/2498#issuecomment-414543549 with hiding direct field access in an external extension type (documented here https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#e...). The idea is to write `a.ndims` in cython (in plain python code), and in C magically get the attribute lookup converted into a `PyArray_NDIMS(a)` getter, which could be a macro or a c-function. The experiments proved fruitful, and garnered some positive feedback so I am pushing forward. I would like to get some feedback on syntax before I progress too far. Should the syntax be extended to support |ctypedef class numpy.ndarray [object PyArrayObject]: cdef: # Convert python __getattr__ access to c functions. int ndims PyArray_NDIMS | or perhaps a decorator, like Python |ctypedef class numpy.ndarray [object PyArrayObject]: cdef: # Convert python __getattr__ access to c functions. @property cdef int ndims(self): return PyArray_NDIMS(self) or something else? The second seems more wordy but more explicit. I don't know which would be easier to implement or require more effort to test and maintain. Matti |
Thanks for looking into this! My preference would be to use the @property syntax, as this will be immediately understandable to any Cython user and could contain arbitrary code, rather than just a macro call. There are, however, a couple of downsides. The first is that it may not be clear when accessing an attribute that a full function call may be invoked. (Arguably this is the same issue one has with Python, but there attribute access is already expensive. The function could be inline as well if desired.) The second is that this means that this attribute is no longer an lvalue. The last is that it's a bit special to be defining methods on an extern class. Maybe it would have to be inline if it's in the pxd? If we're going to be defining a special syntax, I might prefer something like cdef extern class ...: int ndims "PyArray_NDIMS(*)" which more resembles int ndims "nd" Open to bikeshedding on what the "self" placeholder should be. As before, should the ndims lose its lvalue status in this case, or not (in case the accessor is really a macro intended to be used like this)? On Thu, Sep 27, 2018 at 10:38 AM Matti Picus <matti.picus@gmail.com> wrote:
To solve issue #2498, I did some experiments https://github.com/cython/cython/issues/2498#issuecomment-414543549 with hiding direct field access in an external extension type (documented here
https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#e...).
The idea is to write `a.ndims` in cython (in plain python code), and in C magically get the attribute lookup converted into a `PyArray_NDIMS(a)` getter, which could be a macro or a c-function.
The experiments proved fruitful, and garnered some positive feedback so I am pushing forward.
I would like to get some feedback on syntax before I progress too far. Should the syntax be extended to support
|ctypedef class numpy.ndarray [object PyArrayObject]: cdef: # Convert python __getattr__ access to c functions. int ndims PyArray_NDIMS |
or perhaps a decorator, like Python
|ctypedef class numpy.ndarray [object PyArrayObject]: cdef: # Convert python __getattr__ access to c functions. @property cdef int ndims(self): return PyArray_NDIMS(self) or something else? The second seems more wordy but more explicit. I don't know which would be easier to implement or require more effort to test and maintain. Matti |
_______________________________________________ cython-devel mailing list cython-devel@python.org https://mail.python.org/mailman/listinfo/cython-devel
On 27/09/18 22:50, Robert Bradshaw wrote:
On Thu, Sep 27, 2018 at 10:38 AM Matti Picus <matti.picus@gmail.com <mailto:matti.picus@gmail.com>> wrote: To solve issue #2498, I did some experiments https://github.com/cython/cython/issues/2498#issuecomment-414543549 with hiding direct field access in an external extension type (documented here https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#e...).
The idea is to write `a.ndims` in cython (in plain python code), and in C magically get the attribute lookup converted into a `PyArray_NDIMS(a)` getter, which could be a macro or a c-function.
The experiments proved fruitful, and garnered some positive feedback so I am pushing forward.
I would like to get some feedback on syntax before I progress too far. Should the syntax be extended to support
ctypedef class numpy.ndarray [object PyArrayObject]:cdef: # Convert python __getattr__ access to c functions. int ndims PyArray_NDIMS |
or perhaps a decorator, like Python
|ctypedef class numpy.ndarray [object PyArrayObject]: cdef: # Convert python __getattr__ access to c functions. @property cdef int ndims(self): return PyArray_NDIMS(self) or something else? The second seems more wordy but more explicit. I don't know which would be easier to implement or require more effort to test and maintain.
Matti |
Thanks for looking into this!
My preference would be to use the @property syntax, as this will be immediately understandable to any Cython user and could contain arbitrary code, rather than just a macro call.
There are, however, a couple of downsides. The first is that it may not be clear when accessing an attribute that a full function call may be invoked. (Arguably this is the same issue one has with Python, but there attribute access is already expensive. The function could be inline as well if desired.) The second is that this means that this attribute is no longer an lvalue. The last is that it's a bit special to be defining methods on an extern class. Maybe it would have to be inline if it's in the pxd?
If we're going to be defining a special syntax, I might prefer something like
cdef extern class ...: int ndims "PyArray_NDIMS(*)"
which more resembles
int ndims "nd"
Open to bikeshedding on what the "self" placeholder should be. As before, should the ndims lose its lvalue status in this case, or not (in case the accessor is really a macro intended to be used like this)?
Sorry about the formatting messup, the original proposal was supposed to be (this time using double spacing to make sure it works): ----------------------------------------------------------------------------- cdef extern class ...: @property cdef int ndims(self): return PyArray_NDIMS(self) ---------------------------------------------------------- vs -------------------------------------------------------- cdef extern class ...: cdef int ndims PyArray_NDIMS -------------------------------------------------------- The proposal is for a getter via a C function or a macro. NumPy's current public API uses a mix. Currently I am interested in getters that would not allow lvalue at all. Maybe in the future we will have fast rvalue setter functions in NumPy, but the current API does not support them. It remains to be seem how much slowdown we see in real-life benchmarks when calling a small C function from a different shared object to access attributes rather than directly accessing them via struct fields. As I point out in the "experiment" comment referenced above, pandas has code that needs lvalue access to ndarray data, so they would be stuck with the old API which is deprecated but still works for now. Scipy has no such code and oculd move forward to the newer API. As far as bikeshedding the "self" parameter, I would propose doing without, and indeed I successfully hacked Cython to use the second proposal with no self argument and no quotations. Matti
On Thu, Sep 27, 2018 at 11:36 PM Matti Picus <matti.picus@gmail.com> wrote:
On 27/09/18 22:50, Robert Bradshaw wrote:
On Thu, Sep 27, 2018 at 10:38 AM Matti Picus <matti.picus@gmail.com <mailto:matti.picus@gmail.com>> wrote: To solve issue #2498, I did some experiments https://github.com/cython/cython/issues/2498#issuecomment-414543549 with hiding direct field access in an external extension type (documented here
https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#e... ).
The idea is to write `a.ndims` in cython (in plain python code), and in C magically get the attribute lookup converted into a `PyArray_NDIMS(a)` getter, which could be a macro or a c-function.
The experiments proved fruitful, and garnered some positive feedback so I am pushing forward.
I would like to get some feedback on syntax before I progress too far. Should the syntax be extended to support
ctypedef class numpy.ndarray [object PyArrayObject]:cdef: # Convert python __getattr__ access to c functions. int ndims PyArray_NDIMS |
or perhaps a decorator, like Python
|ctypedef class numpy.ndarray [object PyArrayObject]: cdef: # Convert python __getattr__ access to c functions. @property cdef int ndims(self): return PyArray_NDIMS(self) or something else? The second seems more wordy but more explicit. I don't know which would be easier to implement or require more effort to test and maintain.
Matti |
Thanks for looking into this!
My preference would be to use the @property syntax, as this will be immediately understandable to any Cython user and could contain arbitrary code, rather than just a macro call.
There are, however, a couple of downsides. The first is that it may not be clear when accessing an attribute that a full function call may be invoked. (Arguably this is the same issue one has with Python, but there attribute access is already expensive. The function could be inline as well if desired.) The second is that this means that this attribute is no longer an lvalue. The last is that it's a bit special to be defining methods on an extern class. Maybe it would have to be inline if it's in the pxd?
If we're going to be defining a special syntax, I might prefer something like
cdef extern class ...: int ndims "PyArray_NDIMS(*)"
which more resembles
int ndims "nd"
Open to bikeshedding on what the "self" placeholder should be. As before, should the ndims lose its lvalue status in this case, or not (in case the accessor is really a macro intended to be used like this)?
Sorry about the formatting messup, the original proposal was supposed to be (this time using double spacing to make sure it works):
-----------------------------------------------------------------------------
cdef extern class ...:
@property
cdef int ndims(self):
return PyArray_NDIMS(self)
----------------------------------------------------------
vs
--------------------------------------------------------
cdef extern class ...:
cdef int ndims PyArray_NDIMS
--------------------------------------------------------
The proposal is for a getter via a C function or a macro. NumPy's current public API uses a mix. Currently I am interested in getters that would not allow lvalue at all. Maybe in the future we will have fast rvalue setter functions in NumPy, but the current API does not support them. It remains to be seem how much slowdown we see in real-life benchmarks when calling a small C function from a different shared object to access attributes rather than directly accessing them via struct fields.
Hmm...so in this case it upgrading Cython would cause an unconditional switch from direct access to a function call without any code change (or choice) for users of numpy.pxd. I am curious what kind of a slowdown this would represent (though would assume this kind of analysis was done by the NumPy folks when choosing macro vs. function for the public API). As I point out in the "experiment" comment referenced above, pandas has
code that needs lvalue access to ndarray data, so they would be stuck with the old API which is deprecated but still works for now. Scipy has no such code and oculd move forward to the newer API.
But if we upgraded Cython, how would they access the old API? I suppose they could create a setter macro of their own to use in the (presumably few) cases where they needed an lvalue.
As far as bikeshedding the "self" parameter, I would propose doing without, and indeed I successfully hacked Cython to use the second proposal with no self argument and no quotations.
The problem is that when one reads cdef int aaa bbbb there's no indication as to the meaning of this. We also want to be sure to disallow this syntax everywhere but this one context. On the other hand the quotation syntax cdef int aaa "bbb" already has (widespread) meaning of establishing a C alias of the name in question which is essentially what we're trying to do here. I'm still, however, leaning towards the @property syntax (which we could allow for non-extern cdef classes as well). - Robert
Breaking this into a number of sub-dsicussions, since we seem to be branching. The original topic was Re: [Cython] Enhancing "ctyepdef class numpy.ndarray" with getter properties On 28/09/18 01:20, Robert Bradshaw wrote:
Hmm...so in this case it upgrading Cython would cause an unconditional switch from direct access to a function call without any code change (or choice) for users of numpy.pxd. I am curious what kind of a slowdown this would represent (though would assume this kind of analysis was done by the NumPy folks when choosing macro vs. function for the public API).
As I point out in the "experiment" comment referenced above, pandas has code that needs lvalue access to ndarray data, so they would be stuck with the old API which is deprecated but still works for now. Scipy has no such code and oculd move forward to the newer API.
But if we upgraded Cython, how would they access the old API? I suppose they could create a setter macro of their own to use in the (presumably few) cases where they needed an lvalue.
- Robert
NumPy changed its recommended API to an opaque one via inline getter functions in 2011, in this PR https://github.com/numpy/numpy/pull/116. I could not find a discussion on performance impact, perhaps since the functions are in the header files and marked inline. Hopefully the compilers will properly deal with making them fast. However, it is true that when people update to a new version of a library things change. In this case, there are backward-compatibility macros that revert the post-1.7 functions into pre-1.7 macros with the same name. Thus for the experiment I used a new numpy.pxd, defined the pre-1.7 api in the pandas build (experimental changeset https://github.com/mattip/pandas/commit/9113bf7e55e1eddece3544c1ad3ef2a761b5...), and was still able to access ndarray.data as a lvalue. Matti
On 28/09/18 10:25, Matti Picus wrote:
Breaking this into a number of sub-dsicussions, since we seem to be branching. The original topic was
Re: [Cython] Enhancing "ctyepdef class numpy.ndarray" with getter properties
On 28/09/18 01:20, Robert Bradshaw wrote:
Hmm...so in this case it upgrading Cython would cause an unconditional switch from direct access to a function call without any code change (or choice) for users of numpy.pxd. I am curious what kind of a slowdown this would represent (though would assume this kind of analysis was done by the NumPy folks when choosing macro vs. function for the public API).
As I point out in the "experiment" comment referenced above, pandas has code that needs lvalue access to ndarray data, so they would be stuck with the old API which is deprecated but still works for now. Scipy has no such code and oculd move forward to the newer API.
But if we upgraded Cython, how would they access the old API? I suppose they could create a setter macro of their own to use in the (presumably few) cases where they needed an lvalue.
- Robert
NumPy changed its recommended API to an opaque one via inline getter functions in 2011, in this PR https://github.com/numpy/numpy/pull/116. I could not find a discussion on performance impact, perhaps since the functions are in the header files and marked inline. Hopefully the compilers will properly deal with making them fast. However, it is true that when people update to a new version of a library things change. In this case, there are backward-compatibility macros that revert the post-1.7 functions into pre-1.7 macros with the same name.
Thus for the experiment I used a new numpy.pxd, defined the pre-1.7 api in the pandas build (experimental changeset https://github.com/mattip/pandas/commit/9113bf7e55e1eddece3544c1ad3ef2a761b5...), and was still able to access ndarray.data as a lvalue.
Matti
This means cython/numpy could provide an integration path based on numpy starting to ship its own numpy.pxd: - Cython would define the macro (if not already defined) to use the pre-1.7 Numpy API in the numpy.pxd it ships. This would still work (lvalues would be allowed) after direct access is replaced with the getter properties, since they are macros - NumPy would define the macro to use post-1.7 API (if not already defined) in the numpy.pxd it ships, which as I understand would take precedence over cython's. Then projects like pandas could freely upgrade Cython without changing their codebase, but would encounter errors when updating NumPy. Matti
On 28/09/18 01:20, Robert Bradshaw wrote:
On Thu, Sep 27, 2018 at 11:36 PM Matti Picus <matti.picus@gmail.com <mailto:matti.picus@gmail.com>> wrote:
The problem is that when one reads
cdef int aaa bbbb
there's no indication as to the meaning of this. We also want to be sure to disallow this syntax everywhere but this one context. On the other hand the quotation syntax
cdef int aaa "bbb"
already has (widespread) meaning of establishing a C alias of the name in question which is essentially what we're trying to do here.
I'm still, however, leaning towards the @property syntax (which we could allow for non-extern cdef classes as well).
- Robert
Using "PyArray_DIMS" with quotes but without parentheses would indeed be confusing to users and difficult to implement, so "PyArray_DIMS(*)" where the * is TBD seem nicer. It sounds like the jury is still out. In order to compare the solutions, I will move forward with the @property decorator syntax, but to keep it simple I will start small: only getters and specifically for CFuncDefNodes. Then if you still want to look at the other option I will turn my "experiment into a PR. Matti
participants (2)
-
Matti Picus -
Robert Bradshaw