[Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer

Mark Wiebe mwwiebe at gmail.com
Fri May 11 16:17:55 EDT 2012


On Fri, May 11, 2012 at 8:37 AM, mark florisson
<markflorisson88 at gmail.com>wrote:

> On 11 May 2012 12:13, Dag Sverre Seljebotn <d.s.seljebotn at astro.uio.no>
> wrote:
> > (NumPy devs: I know, I get too many ideas. But this time I *really*
> believe
> > in it, I think this is going to be *huge*. And if Mark F. likes it it's
> not
> > going to be without manpower; and as his mentor I'd pitch in too here and
> > there.)
> >
> > (Mark F.: I believe this is *very* relevant to your GSoC. I certainly
> don't
> > want to micro-manage your GSoC, just have your take.)
> >
> > Travis, thank you very much for those good words in the "NA-mask
> > interactions..." thread. It put most of my concerns away. If anybody is
> > leaning towards for opaqueness because of its OOP purity, I want to
> refer to
> > C++ and its walled-garden of ideological purity -- it has, what, 3-4
> > different OOP array libraries, neither of which is able to out-compete
> the
> > other. Meanwhile the rest of the world happily cooperates using pointers,
> > strides, CSR and CSC.
> >
> > Now, there are limits to what you can do with strides and pointers.
> Noone's
> > denying the need for more. In my mind that's an API where you can do
> > fetch_block and put_block of cache-sized, N-dimensional blocks on an
> array;
> > but it might be something slightly different.
> >
> > Here's what I'm asking: DO NOT simply keep extending ndarray and the
> NumPy C
> > API to deal with this issue.
> >
> > What we need is duck-typing/polymorphism at the C level. If you keep
> > extending ndarray and the NumPy C API, what we'll have is a one-to-many
> > relationship: One provider of array technology, multiple consumers (with
> > hooks, I'm sure, but all implementations of the hook concept in the NumPy
> > world I've seen so far are a total disaster!).
> >
> > What I think we need instead is something like PEP 3118 for the
> "abstract"
> > array that is only available block-wise with getters and setters. On the
> > Cython list we've decided that what we want for CEP 1000 (for boxing
> > callbacks etc.) is to extend PyTypeObject with our own fields; we could
> > create CEP 1001 to solve this issue and make any Python object an
> exporter
> > of "block-getter/setter-arrays" (better name needed).
> >
> > What would be exported is (of course) a simple vtable:
> >
> > typedef struct {
> >    int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t *lower_right,
> > ...);
> >    ...
> > } block_getter_setter_array_vtable;
> >
> > Let's please discuss the details *after* the fundamentals. But the
> reason I
> > put void* there instead of PyObject* is that I hope this could be used
> > beyond the Python world (say, Python<->Julia); the void* would be handed
> to
> > you at the time you receive the vtable (however we handle that).
>
> I suppose it would also be useful to have some way of predicting the
> output format polymorphically for the caller. E.g. dense *
> block_diagonal results in block diagonal, but dense + block_diagonal
> results in dense, etc. It might be useful for the caller to know
> whether it needs to allocate a sparse, dense or block-structured
> array. Or maybe the polymorphic function could even do the allocation.
> This needs to happen recursively of course, to avoid intermediate
> temporaries. The compiler could easily handle that, and so could numpy
> when it gets lazy evaluation.
>
> I think if the heavy lifting of allocating output arrays and exporting
> these arrays work in numpy, then support in Cython could use that (I
> can already hear certain people object to more complicated array stuff
> in Cython :). Even better here would be an external project that each
> our projects could use (I still think the nditer sorting functionality
> of arrays should be numpy-agnostic and externally available).
>

It might be nice to expose something which gives an nditer-style looping
primitive through the CEP 1001 mechanism. I could imagine a pure C version
of this and an LLVM bitcode version which could inline into numba or other
LLVM producing systems.

-Mark


>
> > I think this would fit neatly in Mark F.'s GSoC (Mark F.?), because you
> > could embed the block-transposition that's needed for efficient "arr +
> > arr.T" at this level.
> >
> > Imagine being able to do this in Cython:
> >
> > a[...] = b + c * d
> >
> > and have that essentially compile to the numexpr blocked approach, *but*
> > where b, c, and d can have whatever type that exports CEP 1001? So c
> could
> > be a "diagonal" array which uses O(n) storage to export O(n^2) elements,
> for
> > instance, and the unrolled Cython code never needs to know.
> >
> > As far as NumPy goes, something along these lines should hopefully mean
> that
> > new C code being written doesn't rely so much on what exactly goes into
> > "ndarray" and what goes into other classes; so that we don't get the same
> > problem again that we do now with code that doesn't use PEP 3118.
> >
> > Dag
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120511/5861bdd4/attachment.html>


More information about the NumPy-Discussion mailing list