[Numpy-discussion] ndarray and lazy evaluation (was: Proposed Rodmap Overview)

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Mon Feb 20 12:46:49 EST 2012


On 02/20/2012 09:24 AM, Olivier Delalleau wrote:
> Hi Dag,
>
> Would you mind elaborating a bit on that example you mentioned at the
> end of your email? I don't quite understand what behavior you would like
> to achieve

Sure, see below. I think we should continue discussion on numpy-discuss.

I wrote:

> You need at least a slightly different Python API to get anywhere, so
> numexpr/Theano is the right place to work on an implementation of this
> idea. Of course it would be nice if numexpr/Theano offered something as
> convenient as
>
> with lazy:
>      arr = A + B + C # with all of these NumPy arrays
> # compute upon exiting...

More information:

The disadvantage today of using Theano (or numexpr) is that they require 
using a different API, so that one has to learn and use Theano "from the 
ground up", rather than just slap it on in an optimization phase.

The alternative would require extensive changes to NumPy, so I guess 
Theano authors or Francesc would need to push for this.

The alternative would be (with A, B, C ndarray instances):

with theano.lazy:
     arr = A + B + C

On __enter__, the context manager would hook into NumPy to override it's 
arithmetic operators. Then it would build a Theano symbolic tree instead 
of performing computations right away.

In addition to providing support for overriding arithmetic operators, 
slicing etc., it would be necesarry for "arr" to be an ndarray instance 
which is "not yet computed" (data-pointer set to NULL, and store a 
compute-me callback and some context information).

Finally, the __exit__ would trigger computation. For other operations 
which need the data pointer (e.g., single element lookup) one could 
either raise an exception or trigger computation.

This is just a rough sketch. It is not difficult "in principle", but of 
course there's really a massive amount of work involved to work support 
for this into the NumPy APIs.

Probably, we're talking a NumPy 3.0 thing, after the current round of 
refactorings have settled...

Please: Before discussing this further one should figure out if there's 
manpower available for it; no sense in hashing out a castle in the sky 
in details. Also it would be better to talk in person about this if 
possible (I'm in Berkeley now and will attend PyData and PyCon).

Dag



More information about the NumPy-Discussion mailing list