[Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer

Sat May 12 18:27:14 EDT 2012

On Sat, May 12, 2012 at 3:55 PM, Dag Sverre Seljebotn <
d.s.seljebotn at astro.uio.no> wrote:

> On 05/11/2012 03:37 PM, mark florisson wrote:
> > On 11 May 2012 12:13, Dag Sverre Seljebotn<d.s.seljebotn at astro.uio.no>
>  wrote:
> >> (NumPy devs: I know, I get too many ideas. But this time I *really*
> believe
> >> in it, I think this is going to be *huge*. And if Mark F. likes it it's
> not
> >> going to be without manpower; and as his mentor I'd pitch in too here
> and
> >> there.)
> >>
> >> (Mark F.: I believe this is *very* relevant to your GSoC. I certainly
> don't
> >> want to micro-manage your GSoC, just have your take.)
> >>
> >> Travis, thank you very much for those good words in the "NA-mask
> >> interactions..." thread. It put most of my concerns away. If anybody is
> >> leaning towards for opaqueness because of its OOP purity, I want to
> refer to
> >> C++ and its walled-garden of ideological purity -- it has, what, 3-4
> >> different OOP array libraries, neither of which is able to out-compete
> the
> >> other. Meanwhile the rest of the world happily cooperates using
> pointers,
> >> strides, CSR and CSC.
> >>
> >> Now, there are limits to what you can do with strides and pointers.
> Noone's
> >> denying the need for more. In my mind that's an API where you can do
> >> fetch_block and put_block of cache-sized, N-dimensional blocks on an
> array;
> >> but it might be something slightly different.
> >>
> >> Here's what I'm asking: DO NOT simply keep extending ndarray and the
> NumPy C
> >> API to deal with this issue.
> >>
> >> What we need is duck-typing/polymorphism at the C level. If you keep
> >> extending ndarray and the NumPy C API, what we'll have is a one-to-many
> >> relationship: One provider of array technology, multiple consumers (with
> >> hooks, I'm sure, but all implementations of the hook concept in the
> NumPy
> >> world I've seen so far are a total disaster!).
> >>
> >> What I think we need instead is something like PEP 3118 for the
> "abstract"
> >> array that is only available block-wise with getters and setters. On the
> >> Cython list we've decided that what we want for CEP 1000 (for boxing
> >> callbacks etc.) is to extend PyTypeObject with our own fields; we could
> >> create CEP 1001 to solve this issue and make any Python object an
> exporter
> >> of "block-getter/setter-arrays" (better name needed).
> >>
> >> What would be exported is (of course) a simple vtable:
> >>
> >> typedef struct {
> >>     int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t
> *lower_right,
> >> ...);
> >>     ...
> >> } block_getter_setter_array_vtable;
> >>
> >> Let's please discuss the details *after* the fundamentals. But the
> reason I
> >> put void* there instead of PyObject* is that I hope this could be used
> >> beyond the Python world (say, Python<->Julia); the void* would be
> handed to
> >> you at the time you receive the vtable (however we handle that).
> >
> > I suppose it would also be useful to have some way of predicting the
> > output format polymorphically for the caller. E.g. dense *
> > block_diagonal results in block diagonal, but dense + block_diagonal
> > results in dense, etc. It might be useful for the caller to know
> > whether it needs to allocate a sparse, dense or block-structured
> > array. Or maybe the polymorphic function could even do the allocation.
> > This needs to happen recursively of course, to avoid intermediate
> > temporaries. The compiler could easily handle that, and so could numpy
> > when it gets lazy evaluation.
>
> Ah. But that depends too on the computation to be performed too; a)
> elementwise, b) axis-wise reductions, c) linear algebra...
>
> In my oomatrix code (please don't look at it, it's shameful) I do this
> using multiple dispatch.
>
> I'd rather ignore this for as long as we can, only implementing "a[:] =
> ..." -- I can't see how decisions here would trickle down to the API
> that's used in the kernel, it's more like a pre-phase, and better
> treated orthogonally.
>
> > I think if the heavy lifting of allocating output arrays and exporting
> > these arrays work in numpy, then support in Cython could use that (I
> > can already hear certain people object to more complicated array stuff
> > in Cython :). Even better here would be an external project that each
> > our projects could use (I still think the nditer sorting functionality
> > of arrays should be numpy-agnostic and externally available).
>
> I agree with the separate project idea. It's trivial for NumPy to
> incorporate that as one of its methods for exporting arrays, and I don't
> think it makes sense to either build it into Cython, or outright depend
> on NumPy.
>
> Here's what I'd like (working title: NumBridge?).
>
>  - Mission: Be the "double* + shape + strides" in a world where that is
> no longer enough, by providing tight, focused APIs/ABIs that are usable
> across C/Fortran/Python.
>
> I basically want something I can quickly acquire from a NumPy array,
> then pass it into my C code without dragging along all the cruft that I
> don't need.
>
>  - Written in pure C + specs, usable without Python
>
>  - PEP 3118 "done right", basically semi-standardize the internal
> Cython memoryview ABI and get something that's passable on stack
>
>  - Get block get/put API
>
>  - Iterator APIs
>
>  - Utility code for exporters and clients (iteration code, axis
> reordering, etc.)
>
> Is the scope of that insane, or is it at least worth a shot to see how
> bad it is? Beyond figuring out a small subset that can be done first,
> and whether performance considerations must be taken or not, there's two
> complicating factors: Pluggable dtypes, memory management. Perhaps you
> could come to Oslo for a couple of days to brainstorm...
>
>
There have been musings on this list along those lines with the idea that
numpy/ufuncs would be built on top of that base, so it isn't crazy ;)
Perhaps it is time to take a more serious look at it. Especially if there
is help to get it implemented and made available through tools such as
Cython.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120512/2956aa20/attachment.html>