<br><br><div class="gmail_quote">On Sat, May 12, 2012 at 3:55 PM, Dag Sverre Seljebotn <span dir="ltr"><<a href="mailto:d.s.seljebotn@astro.uio.no" target="_blank">d.s.seljebotn@astro.uio.no</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="HOEnZb"><div class="h5">On 05/11/2012 03:37 PM, mark florisson wrote:<br>

> On 11 May 2012 12:13, Dag Sverre Seljebotn<<a href="mailto:d.s.seljebotn@astro.uio.no">d.s.seljebotn@astro.uio.no</a>>  wrote:<br>

>> (NumPy devs: I know, I get too many ideas. But this time I *really* believe<br>

>> in it, I think this is going to be *huge*. And if Mark F. likes it it's not<br>

>> going to be without manpower; and as his mentor I'd pitch in too here and<br>

>> there.)<br>

>><br>

>> (Mark F.: I believe this is *very* relevant to your GSoC. I certainly don't<br>

>> want to micro-manage your GSoC, just have your take.)<br>

>><br>

>> Travis, thank you very much for those good words in the "NA-mask<br>

>> interactions..." thread. It put most of my concerns away. If anybody is<br>

>> leaning towards for opaqueness because of its OOP purity, I want to refer to<br>

>> C++ and its walled-garden of ideological purity -- it has, what, 3-4<br>

>> different OOP array libraries, neither of which is able to out-compete the<br>

>> other. Meanwhile the rest of the world happily cooperates using pointers,<br>

>> strides, CSR and CSC.<br>

>><br>

>> Now, there are limits to what you can do with strides and pointers. Noone's<br>

>> denying the need for more. In my mind that's an API where you can do<br>

>> fetch_block and put_block of cache-sized, N-dimensional blocks on an array;<br>

>> but it might be something slightly different.<br>

>><br>

>> Here's what I'm asking: DO NOT simply keep extending ndarray and the NumPy C<br>

>> API to deal with this issue.<br>

>><br>

>> What we need is duck-typing/polymorphism at the C level. If you keep<br>

>> extending ndarray and the NumPy C API, what we'll have is a one-to-many<br>

>> relationship: One provider of array technology, multiple consumers (with<br>

>> hooks, I'm sure, but all implementations of the hook concept in the NumPy<br>

>> world I've seen so far are a total disaster!).<br>

>><br>

>> What I think we need instead is something like PEP 3118 for the "abstract"<br>

>> array that is only available block-wise with getters and setters. On the<br>

>> Cython list we've decided that what we want for CEP 1000 (for boxing<br>

>> callbacks etc.) is to extend PyTypeObject with our own fields; we could<br>

>> create CEP 1001 to solve this issue and make any Python object an exporter<br>

>> of "block-getter/setter-arrays" (better name needed).<br>

>><br>

>> What would be exported is (of course) a simple vtable:<br>

>><br>

>> typedef struct {<br>

>>     int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t *lower_right,<br>

>> ...);<br>

>>     ...<br>

>> } block_getter_setter_array_vtable;<br>

>><br>

>> Let's please discuss the details *after* the fundamentals. But the reason I<br>

>> put void* there instead of PyObject* is that I hope this could be used<br>

>> beyond the Python world (say, Python<->Julia); the void* would be handed to<br>

>> you at the time you receive the vtable (however we handle that).<br>

><br>

> I suppose it would also be useful to have some way of predicting the<br>

> output format polymorphically for the caller. E.g. dense *<br>

> block_diagonal results in block diagonal, but dense + block_diagonal<br>

> results in dense, etc. It might be useful for the caller to know<br>

> whether it needs to allocate a sparse, dense or block-structured<br>

> array. Or maybe the polymorphic function could even do the allocation.<br>

> This needs to happen recursively of course, to avoid intermediate<br>

> temporaries. The compiler could easily handle that, and so could numpy<br>

> when it gets lazy evaluation.<br>

<br>

</div></div>Ah. But that depends too on the computation to be performed too; a)<br>

elementwise, b) axis-wise reductions, c) linear algebra...<br>

<br>

In my oomatrix code (please don't look at it, it's shameful) I do this<br>

using multiple dispatch.<br>

<br>

I'd rather ignore this for as long as we can, only implementing "a[:] =<br>

..." -- I can't see how decisions here would trickle down to the API<br>

that's used in the kernel, it's more like a pre-phase, and better<br>

treated orthogonally.<br>

<div class="im"><br>

> I think if the heavy lifting of allocating output arrays and exporting<br>

> these arrays work in numpy, then support in Cython could use that (I<br>

> can already hear certain people object to more complicated array stuff<br>

> in Cython :). Even better here would be an external project that each<br>

> our projects could use (I still think the nditer sorting functionality<br>

> of arrays should be numpy-agnostic and externally available).<br>

<br>

</div>I agree with the separate project idea. It's trivial for NumPy to<br>

incorporate that as one of its methods for exporting arrays, and I don't<br>

think it makes sense to either build it into Cython, or outright depend<br>

on NumPy.<br>

<br>

Here's what I'd like (working title: NumBridge?).<br>

<br>

  - Mission: Be the "double* + shape + strides" in a world where that is<br>

no longer enough, by providing tight, focused APIs/ABIs that are usable<br>

across C/Fortran/Python.<br>

<br>

I basically want something I can quickly acquire from a NumPy array,<br>

then pass it into my C code without dragging along all the cruft that I<br>

don't need.<br>

<br>

  - Written in pure C + specs, usable without Python<br>

<br>

  - PEP 3118 "done right", basically semi-standardize the internal<br>

Cython memoryview ABI and get something that's passable on stack<br>

<br>

  - Get block get/put API<br>

<br>

  - Iterator APIs<br>

<br>

  - Utility code for exporters and clients (iteration code, axis<br>

reordering, etc.)<br>

<br>

Is the scope of that insane, or is it at least worth a shot to see how<br>

bad it is? Beyond figuring out a small subset that can be done first,<br>

and whether performance considerations must be taken or not, there's two<br>

complicating factors: Pluggable dtypes, memory management. Perhaps you<br>

could come to Oslo for a couple of days to brainstorm...<br>

<div class="HOEnZb"><div class="h5"><br></div></div></blockquote><div><br>There have been musings on this list along those lines with the idea that numpy/ufuncs would be built on top of that base, so it isn't crazy ;) Perhaps it is time to take a more serious look at it. Especially if there is help to get it implemented and made available through tools such as Cython.<br>

<br>Chuck<br></div><br></div>