[Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Fri May 11 07:13:28 EDT 2012


(NumPy devs: I know, I get too many ideas. But this time I *really* 
believe in it, I think this is going to be *huge*. And if Mark F. likes 
it it's not going to be without manpower; and as his mentor I'd pitch in 
too here and there.)

(Mark F.: I believe this is *very* relevant to your GSoC. I certainly 
don't want to micro-manage your GSoC, just have your take.)

Travis, thank you very much for those good words in the "NA-mask 
interactions..." thread. It put most of my concerns away. If anybody is 
leaning towards for opaqueness because of its OOP purity, I want to 
refer to C++ and its walled-garden of ideological purity -- it has, 
what, 3-4 different OOP array libraries, neither of which is able to 
out-compete the other. Meanwhile the rest of the world happily 
cooperates using pointers, strides, CSR and CSC.

Now, there are limits to what you can do with strides and pointers. 
Noone's denying the need for more. In my mind that's an API where you 
can do fetch_block and put_block of cache-sized, N-dimensional blocks on 
an array; but it might be something slightly different.

Here's what I'm asking: DO NOT simply keep extending ndarray and the 
NumPy C API to deal with this issue.

What we need is duck-typing/polymorphism at the C level. If you keep 
extending ndarray and the NumPy C API, what we'll have is a one-to-many 
relationship: One provider of array technology, multiple consumers (with 
hooks, I'm sure, but all implementations of the hook concept in the 
NumPy world I've seen so far are a total disaster!).

What I think we need instead is something like PEP 3118 for the 
"abstract" array that is only available block-wise with getters and 
setters. On the Cython list we've decided that what we want for CEP 1000 
(for boxing callbacks etc.) is to extend PyTypeObject with our own 
fields; we could create CEP 1001 to solve this issue and make any Python 
object an exporter of "block-getter/setter-arrays" (better name needed).

What would be exported is (of course) a simple vtable:

typedef struct {
     int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t 
*lower_right, ...);
     ...
} block_getter_setter_array_vtable;

Let's please discuss the details *after* the fundamentals. But the 
reason I put void* there instead of PyObject* is that I hope this could 
be used beyond the Python world (say, Python<->Julia); the void* would 
be handed to you at the time you receive the vtable (however we handle 
that).

I think this would fit neatly in Mark F.'s GSoC (Mark F.?), because you 
could embed the block-transposition that's needed for efficient "arr + 
arr.T" at this level.

Imagine being able to do this in Cython:

a[...] = b + c * d

and have that essentially compile to the numexpr blocked approach, *but* 
where b, c, and d can have whatever type that exports CEP 1001? So c 
could be a "diagonal" array which uses O(n) storage to export O(n^2) 
elements, for instance, and the unrolled Cython code never needs to know.

As far as NumPy goes, something along these lines should hopefully mean 
that new C code being written doesn't rely so much on what exactly goes 
into "ndarray" and what goes into other classes; so that we don't get 
the same problem again that we do now with code that doesn't use PEP 3118.

Dag



More information about the NumPy-Discussion mailing list