[Cython] Multidimensional indexing of C++ objects

Wed Jul 15 03:30:39 CEST 2015

On Wed, Jul 8, 2015 at 11:20 AM Stefan Behnel <stefan_ml at behnel.de> wrote:

> Ian Henriksen schrieb am 08.07.2015 um 03:50:
> > On Sat, Jul 4, 2015 at 12:43 AM Stefan Behnel wrote:
> >> Ian Henriksen schrieb am 04.07.2015 um 00:43:
> >>> I'm a GSOC student working to make a Cython API for DyND. DyND
> >>> <https://github.com/libdynd/libdynd> is a relatively new n-dimensional
> >>> array library in C++ that is based on NumPy. A full set of Python
> >> bindings
> >>> (created using Cython) are provided as a separate package. The goal of
> my
> >>> project is to make it so that DyND arrays can be used easily within
> >> Cython
> >>> so that an n-dimensional array object can be used without any of the
> >>> corresponding Python overhead.
> >>>
> >>> Currently, there isn't a good way to assign to multidimensional slices
> >>> within Cython. Since the indexing operator in C++ is limited to a
> single
> >>> argument, we use the call operator to represent multidimensional
> >> indexing,
> >>> and then use a proxy class to perform assignment to a slice.
> >>> Currently, in C++, assigning to a slice along the second axis of a DyND
> >>> array looks like this:
> >>>
> >>> a(irange(), 1).vals() = 0;
> >>>
> >>> Unfortunately, in Cython, only the index operator can be used for
> >>> assignment, so following the C++ syntax isn't currently possible. Does
> >>> anyone know of a good way to address this?
> >>
> >> Just an idea, don't know how feasible this is, but we could allow inline
> >> special methods in C++ class declarations that implement Python
> protocols.
> >> Example:
> >>
> >>     cdef extern from ...:
> >>         cppclass Array2D:
> >>            int operator[] except +
> >>            int getItemAt(ssize_t x, ssize_t y) except +
> >>
> >>            cdef inline __getitem__(self, Py_ssize_t x, Py_ssize_t y):
> >>                return self.getItemAt(x, y)
> >>
> >>     def test():
> >>         cdef Array2D a
> >>         return a[1, 2]
> >>
> >> Cython could then translate an item access on an Array2D instance into
> the
> >> corresponding special "method" call.
> >>
> >> Drawbacks:
> >>
> >> 1) The example above would conflict with the C++ [] operator, so it
> would
> >> be ambiguous which one is being used in Cython code. Not sure if
> there's a
> >> use case for making both available to Cython code, but that would be
> >> difficult to achieve if the need arises.
> >>
> >> 2) It doesn't solve the general problem of assigning to C++ expressions,
> >> especially because it does not extend the syntax allowed by Cython which
> >> would still limit what you can do in these fake special methods.
> >>
> >> Regarding your proposals, I'd be happy if we could avoid adding syntax
> >> support for assigning to function calls. And I agree that the cname
> >> assignment hack is really just a big hack. It shouldn't be relied on.
> >
> > Yes, both this idea and the modified version that redefines operator[]
> are
> > similar to the idea I had about respecting the cname entries for
> > operator[]. This method would certainly expose a more flexible API for
> > modules that want to do this. It may work in my case, but I worry that
> > getting this into Cython would further complicate the (already lengthy)
> > indexing logic.
>
> The main problem with the logic in IndexNode is that it predates the
> infrastructure change that allows node replacements in the analyse_types()
> methods. It should eventually be split into separate nodes that do
> different things, e.g. integer indexing into C arrays, Python object item
> access, C++ operator[] usage, buffer/memory view indexing, memory view
> slicing, you name it.
>
> In any case, adding new functionality can now be done by creating a new
> node rather than complicating the type analysis code. And any further
> refactoring would be warmly appreciated. :)
>
>
> > I'm still uneasy about exporting an API that is
> > fundamentally different from the existing Python and C++ APIs, but
> making a
> > way to use Python's syntax could help with that. Is there a good way to
> > make a method like this accept Python-like indexing syntax? It would be
> > confusing to put a code definition like this inside an extern block too.
> > Could this syntax be adapted to work outside the extern block while still
> > showing its connection to the original cppclass?
>
> The feature of providing inline functions in .pxd files already exists, as
> does the feature of adding functionality to external extension types by
> implementing special methods in their declaration. See, for example, the
> buffer protocol support for old NumPy arrays that we implemented in
> numpy/__init__pxd (look for "__getbuffer__") or the helper functions in
> cpython/array.pxd.
>
> Allowing to override __getitem__() in an extern C++ class declaration would
> really only be one step further. The question is whether __getitem__() is
> the right abstraction to use here as it also only accepts a single argument
> as input. That would be a tuple in Python for multi-dimensional lookup. It
> would be nice if the index arguments (e.g. x,y,z for 3 dimensions) could be
> explicit in the method signature instead, potentially using default
> arguments if less dimensions should be allowed.
>
> Stefan
>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> https://mail.python.org/mailman/listinfo/cython-devel

Okay, I see what you mean in the numpy pxd file. With regards to which
abstraction to use, __getitem__ and __setitem__ would work best, but we
would need to allow a greater number of arguments. We could force users to
use something like a C++ vector or tuple, but that creates a bunch of
useless temporaries (which we can only hope the C++ compiler will handle
properly) and will cause all kinds of grief when mixing code compiled with
different standard libraries. That's not the end of the world, but I'd
prefer to avoid incompatibilities like that wherever possible. Using
__getitem__ and __setitem__ does have the advantage that it would maintain
a separation between using an indexing operation on the right or left of an
assignment so that libraries could handle the two cases separately. That
has the potential to make the Cython API exposed to users simpler than the
C++ API in some cases, though overwriting operator[] comes with its own set
of problems.

Using operator[] as a name for a Cython-level indexing operation would work
too. Since we're already overloading the syntax, the name is available. On
the other hand, there don't appear to be any additional benefits other than
leaving __getitem__ and __setitem__ alone. I feel like using the same name
as is used in C++ could be a potential point of confusion. Rewriting a C++
class's handling of indexing in code that primarily provides a wrapper is
plenty confusing on its own. Using different names would at least help
accentuate the change.

With regards to defining these special methods, how would one go about
using the C++ version of the syntax? I've been trying to make it so that
all the Python compatibility functions in dynd-python can be loaded
dynamically via cimport so that the users will only have to find and
include the headers from the original C++ library. With the additional
special methods like this, the only way I can see to do this is to define
C++ functions that operate on the C++ objects, provide their declarations
and make Cython shims in a pyx file, then use the Cython shims inside the
pxd file to define the special __getitem__ and __setitem__ methods. Even
though most users won't see it, that's an awful lot of code for any library
writer that wants to expose an interface like this. Do you see any better
solution?

Thanks for the great suggestions!

-Ian Henriksen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20150715/36928d54/attachment-0001.html>