[Numpy-discussion] ndarray and lazy evaluation (was: Proposed Rodmap Overview)
Dag Sverre Seljebotn
d.s.seljebotn at astro.uio.no
Mon Feb 20 12:46:49 EST 2012
On 02/20/2012 09:24 AM, Olivier Delalleau wrote:
> Hi Dag,
>
> Would you mind elaborating a bit on that example you mentioned at the
> end of your email? I don't quite understand what behavior you would like
> to achieve
Sure, see below. I think we should continue discussion on numpy-discuss.
I wrote:
> You need at least a slightly different Python API to get anywhere, so
> numexpr/Theano is the right place to work on an implementation of this
> idea. Of course it would be nice if numexpr/Theano offered something as
> convenient as
>
> with lazy:
> arr = A + B + C # with all of these NumPy arrays
> # compute upon exiting...
More information:
The disadvantage today of using Theano (or numexpr) is that they require
using a different API, so that one has to learn and use Theano "from the
ground up", rather than just slap it on in an optimization phase.
The alternative would require extensive changes to NumPy, so I guess
Theano authors or Francesc would need to push for this.
The alternative would be (with A, B, C ndarray instances):
with theano.lazy:
arr = A + B + C
On __enter__, the context manager would hook into NumPy to override it's
arithmetic operators. Then it would build a Theano symbolic tree instead
of performing computations right away.
In addition to providing support for overriding arithmetic operators,
slicing etc., it would be necesarry for "arr" to be an ndarray instance
which is "not yet computed" (data-pointer set to NULL, and store a
compute-me callback and some context information).
Finally, the __exit__ would trigger computation. For other operations
which need the data pointer (e.g., single element lookup) one could
either raise an exception or trigger computation.
This is just a rough sketch. It is not difficult "in principle", but of
course there's really a massive amount of work involved to work support
for this into the NumPy APIs.
Probably, we're talking a NumPy 3.0 thing, after the current round of
refactorings have settled...
Please: Before discussing this further one should figure out if there's
manpower available for it; no sense in hashing out a castle in the sky
in details. Also it would be better to talk in person about this if
possible (I'm in Berkeley now and will attend PyData and PyCon).
Dag
More information about the NumPy-Discussion
mailing list