[Numpy-discussion] Starting work on ufunc rewrite

Thu Mar 31 16:00:27 EDT 2016

I have started discussing with Nathaniel the implementation of the ufunc
ABI break that he proposed in a draft NEP a few months ago:

http://thread.gmane.org/gmane.comp.python.numeric.general/61270

His original proposal was to make the public portion of PyUFuncObject be:

    typedef struct {
        PyObject_HEAD
        int nin, nout, nargs;
    } PyUFuncObject;

Of course the idea is that internally we would use a much larger struct
that we could change at will, as long as its first few entries matched
those of PyUFuncObject. My problem with this, and I may very well be
missing something, is that in PyUFunc_Type we need to set the tp_basicsize
to the size of the extended struct, so we would end up having to expose its
contents. This is somewhat similar to what now happens with PyArrayObject:
anyone can #include "ndarraytypes.h", cast PyArrayObject* to
PyArrayObjectFields*, and access the guts of the struct without using the
supplied API inline functions. Not the end of the world, but if you want to
make something private, you might as well make it truly private.

I think it would be to have something similar to what NpyIter does::

    typedef struct {
        PyObject_HEAD
        NpyUFunc *ufunc;
    } PyUFuncObject;

where NpyUFunc would, at this level, be an opaque type of which nothing
would be known. We could have some of the NpyUFunc attributes cached on the
PyUFuncObject struct for easier access, as is done in NewNpyArrayIterObject.
This would also give us more liberty in making NpyUFunc be whatever we want
it to be, including a variable-sized memory chunk that we could use and
access at will. NpyIter is again a good example, where rather than storing
pointers to strides and dimensions arrays, these are made part of the
NpyIter memory chunk, effectively being equivalent to having variable sized
arrays as part of the struct. And I think we will probably no longer
trigger the Cython warnings about size changes either.

Any thoughts on this approach? Is there anything fundamentally wrong with
what I'm proposing here?

Also, this is probably going to end up being a rewrite of a pretty large
and complex codebase. I am not sure that working on this on my own and
eventually sending a humongous PR is the best approach. Any thoughts on how
best to handle turning this into a collaborative, incremental effort?
Anyone who would like to join in the fun?

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160331/c5ab918f/attachment.html>