[Cython] Enhancing "ctyepdef class numpy.ndarray" with getter properties

Robert Bradshaw robertwb at gmail.com
Thu Sep 27 18:20:51 EDT 2018


On Thu, Sep 27, 2018 at 11:36 PM Matti Picus <matti.picus at gmail.com> wrote:

> On 27/09/18 22:50, Robert Bradshaw wrote:
> >
> >     On Thu, Sep 27, 2018 at 10:38 AM Matti Picus
> >     <matti.picus at gmail.com <mailto:matti.picus at gmail.com>> wrote:
> >     To solve issue #2498, I did some experiments
> >     https://github.com/cython/cython/issues/2498#issuecomment-414543549
> >     with
> >     hiding direct field access in an external extension type (documented
> >     here
> >
> https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#external-extension-types
> ).
> >
> >     The idea is to write `a.ndims` in cython (in plain python code),
> >     and in
> >     C magically get the attribute lookup converted into a
> >     `PyArray_NDIMS(a)`
> >     getter, which could be a macro or a c-function.
> >
> >     The experiments proved fruitful, and garnered some positive
> >     feedback so
> >     I am pushing forward.
> >
> >     I would like to get some feedback on syntax before I progress too
> >     far.
> >     Should the syntax be extended to support
> >
> >     ctypedef class numpy.ndarray [object PyArrayObject]:cdef: # Convert
> >     python __getattr__ access to c functions. int ndims PyArray_NDIMS |
> >
> >
> >     or perhaps a decorator, like Python
> >
> >     |ctypedef class numpy.ndarray [object PyArrayObject]: cdef: # Convert
> >     python __getattr__ access to c functions. @property  cdef int
> >     ndims(self): return PyArray_NDIMS(self) or something else? The second
> >     seems more wordy but more explicit. I don't know which would be
> >     easier
> >     to implement or require more effort to test and maintain.
> >
>
> >     Matti |
> >
> > Thanks for looking into this!
> >
> > My preference would be to use the @property syntax, as this will be
> > immediately understandable to any Cython user and could contain
> > arbitrary code, rather than just a macro call.
> >
> > There are, however, a couple of downsides. The first is that it may
> > not be clear when accessing an attribute that a full function call may
> > be invoked. (Arguably this is the same issue one has with Python, but
> > there attribute access is already expensive. The function could be
> > inline as well if desired.) The second is that this means that this
> > attribute is no longer an lvalue. The last is that it's a bit special
> > to be defining methods on an extern class. Maybe it would have to be
> > inline if it's in the pxd?
> >
> > If we're going to be defining a special syntax, I might prefer
> > something like
> >
> > cdef extern class ...:
> >     int ndims "PyArray_NDIMS(*)"
> >
> > which more resembles
> >
> >     int ndims "nd"
> >
> > Open to bikeshedding on what the "self" placeholder should be. As
> > before, should the ndims lose its lvalue status in this case, or not
> > (in case the accessor is really a macro intended to be used like this)?
> >
> >
> Sorry about the formatting messup, the original proposal was supposed to
> be (this time using double spacing to make sure it works):
>
>
> -----------------------------------------------------------------------------
>
> cdef extern class ...:
>
>      @property
>
>      cdef int ndims(self):
>
>          return PyArray_NDIMS(self)
>
> ----------------------------------------------------------
>
> vs
>
> --------------------------------------------------------
>
> cdef extern class ...:
>
>      cdef int ndims PyArray_NDIMS
>
> --------------------------------------------------------
>
> The proposal  is for a getter via a C function or a macro. NumPy's
> current public API uses a mix. Currently I am interested in getters that
> would not allow lvalue at all. Maybe in the future we will have fast
> rvalue setter functions in NumPy, but the current API does not support
> them. It remains to be seem how much slowdown we see in real-life
> benchmarks when calling a small C function from a different shared
> object to access attributes rather than directly accessing them via
> struct fields.
>

Hmm...so in this case it upgrading Cython would cause an unconditional
switch from direct access to a function call without any code change (or
choice) for users of numpy.pxd. I am curious what kind of a slowdown this
would represent (though would assume this kind of analysis was done by the
NumPy folks when choosing macro vs. function for the public API).

As I point out in the "experiment" comment referenced above, pandas has
> code that needs lvalue access to ndarray data, so they would be stuck
> with the old API which is deprecated but still works for now. Scipy has
> no such code and oculd move forward to the newer API.
>

But if we upgraded Cython, how would they access the old API? I suppose
they could create a setter macro of their own to use in the (presumably
few) cases where they needed an lvalue.


> As far as bikeshedding the "self" parameter, I would propose doing
> without, and indeed I successfully hacked Cython to use the second
> proposal with no self argument and no quotations.
>

The problem is that when one reads

    cdef int aaa bbbb

there's no indication as to the meaning of this. We also want to be sure to
disallow this syntax everywhere but this one context. On the other hand the
quotation syntax

    cdef int aaa "bbb"

already has (widespread) meaning of establishing a C alias of the name in
question which is essentially what we're trying to do here.

I'm still, however, leaning towards the @property syntax (which we could
allow for non-extern cdef classes as well).

- Robert
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20180928/6e1ace0f/attachment-0001.html>


More information about the cython-devel mailing list