[Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Fri May 11 07:15:23 EDT 2012


On 05/11/2012 01:13 PM, Dag Sverre Seljebotn wrote:
> (NumPy devs: I know, I get too many ideas. But this time I *really*
> believe in it, I think this is going to be *huge*. And if Mark F. likes
> it it's not going to be without manpower; and as his mentor I'd pitch in
> too here and there.)
>
> (Mark F.: I believe this is *very* relevant to your GSoC. I certainly
> don't want to micro-manage your GSoC, just have your take.)

For the information of the rest of you:

http://www.google-melange.com/gsoc/project/google/gsoc2012/markflorisson88/30002

Dag

>
> Travis, thank you very much for those good words in the "NA-mask
> interactions..." thread. It put most of my concerns away. If anybody is
> leaning towards for opaqueness because of its OOP purity, I want to
> refer to C++ and its walled-garden of ideological purity -- it has,
> what, 3-4 different OOP array libraries, neither of which is able to
> out-compete the other. Meanwhile the rest of the world happily
> cooperates using pointers, strides, CSR and CSC.
>
> Now, there are limits to what you can do with strides and pointers.
> Noone's denying the need for more. In my mind that's an API where you
> can do fetch_block and put_block of cache-sized, N-dimensional blocks on
> an array; but it might be something slightly different.
>
> Here's what I'm asking: DO NOT simply keep extending ndarray and the
> NumPy C API to deal with this issue.
>
> What we need is duck-typing/polymorphism at the C level. If you keep
> extending ndarray and the NumPy C API, what we'll have is a one-to-many
> relationship: One provider of array technology, multiple consumers (with
> hooks, I'm sure, but all implementations of the hook concept in the
> NumPy world I've seen so far are a total disaster!).
>
> What I think we need instead is something like PEP 3118 for the
> "abstract" array that is only available block-wise with getters and
> setters. On the Cython list we've decided that what we want for CEP 1000
> (for boxing callbacks etc.) is to extend PyTypeObject with our own
> fields; we could create CEP 1001 to solve this issue and make any Python
> object an exporter of "block-getter/setter-arrays" (better name needed).
>
> What would be exported is (of course) a simple vtable:
>
> typedef struct {
>       int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t
> *lower_right, ...);
>       ...
> } block_getter_setter_array_vtable;
>
> Let's please discuss the details *after* the fundamentals. But the
> reason I put void* there instead of PyObject* is that I hope this could
> be used beyond the Python world (say, Python<->Julia); the void* would
> be handed to you at the time you receive the vtable (however we handle
> that).
>
> I think this would fit neatly in Mark F.'s GSoC (Mark F.?), because you
> could embed the block-transposition that's needed for efficient "arr +
> arr.T" at this level.
>
> Imagine being able to do this in Cython:
>
> a[...] = b + c * d
>
> and have that essentially compile to the numexpr blocked approach, *but*
> where b, c, and d can have whatever type that exports CEP 1001? So c
> could be a "diagonal" array which uses O(n) storage to export O(n^2)
> elements, for instance, and the unrolled Cython code never needs to know.
>
> As far as NumPy goes, something along these lines should hopefully mean
> that new C code being written doesn't rely so much on what exactly goes
> into "ndarray" and what goes into other classes; so that we don't get
> the same problem again that we do now with code that doesn't use PEP 3118.
>
> Dag
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion




More information about the NumPy-Discussion mailing list