[Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer

mark florisson markflorisson88 at gmail.com
Fri May 11 09:37:45 EDT 2012

On 11 May 2012 12:13, Dag Sverre Seljebotn <d.s.seljebotn at astro.uio.no> wrote:
> (NumPy devs: I know, I get too many ideas. But this time I *really* believe
> in it, I think this is going to be *huge*. And if Mark F. likes it it's not
> going to be without manpower; and as his mentor I'd pitch in too here and
> there.)
> (Mark F.: I believe this is *very* relevant to your GSoC. I certainly don't
> want to micro-manage your GSoC, just have your take.)
> Travis, thank you very much for those good words in the "NA-mask
> interactions..." thread. It put most of my concerns away. If anybody is
> leaning towards for opaqueness because of its OOP purity, I want to refer to
> C++ and its walled-garden of ideological purity -- it has, what, 3-4
> different OOP array libraries, neither of which is able to out-compete the
> other. Meanwhile the rest of the world happily cooperates using pointers,
> strides, CSR and CSC.
> Now, there are limits to what you can do with strides and pointers. Noone's
> denying the need for more. In my mind that's an API where you can do
> fetch_block and put_block of cache-sized, N-dimensional blocks on an array;
> but it might be something slightly different.
> Here's what I'm asking: DO NOT simply keep extending ndarray and the NumPy C
> API to deal with this issue.
> What we need is duck-typing/polymorphism at the C level. If you keep
> extending ndarray and the NumPy C API, what we'll have is a one-to-many
> relationship: One provider of array technology, multiple consumers (with
> hooks, I'm sure, but all implementations of the hook concept in the NumPy
> world I've seen so far are a total disaster!).
> What I think we need instead is something like PEP 3118 for the "abstract"
> array that is only available block-wise with getters and setters. On the
> Cython list we've decided that what we want for CEP 1000 (for boxing
> callbacks etc.) is to extend PyTypeObject with our own fields; we could
> create CEP 1001 to solve this issue and make any Python object an exporter
> of "block-getter/setter-arrays" (better name needed).
> What would be exported is (of course) a simple vtable:
> typedef struct {
>    int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t *lower_right,
> ...);
>    ...
> } block_getter_setter_array_vtable;
> Let's please discuss the details *after* the fundamentals. But the reason I
> put void* there instead of PyObject* is that I hope this could be used
> beyond the Python world (say, Python<->Julia); the void* would be handed to
> you at the time you receive the vtable (however we handle that).

I suppose it would also be useful to have some way of predicting the
output format polymorphically for the caller. E.g. dense *
block_diagonal results in block diagonal, but dense + block_diagonal
results in dense, etc. It might be useful for the caller to know
whether it needs to allocate a sparse, dense or block-structured
array. Or maybe the polymorphic function could even do the allocation.
This needs to happen recursively of course, to avoid intermediate
temporaries. The compiler could easily handle that, and so could numpy
when it gets lazy evaluation.

I think if the heavy lifting of allocating output arrays and exporting
these arrays work in numpy, then support in Cython could use that (I
can already hear certain people object to more complicated array stuff
in Cython :). Even better here would be an external project that each
our projects could use (I still think the nditer sorting functionality
of arrays should be numpy-agnostic and externally available).

> I think this would fit neatly in Mark F.'s GSoC (Mark F.?), because you
> could embed the block-transposition that's needed for efficient "arr +
> arr.T" at this level.
> Imagine being able to do this in Cython:
> a[...] = b + c * d
> and have that essentially compile to the numexpr blocked approach, *but*
> where b, c, and d can have whatever type that exports CEP 1001? So c could
> be a "diagonal" array which uses O(n) storage to export O(n^2) elements, for
> instance, and the unrolled Cython code never needs to know.
> As far as NumPy goes, something along these lines should hopefully mean that
> new C code being written doesn't rely so much on what exactly goes into
> "ndarray" and what goes into other classes; so that we don't get the same
> problem again that we do now with code that doesn't use PEP 3118.
> Dag

More information about the NumPy-Discussion mailing list