ndarray and lazy evaluation (was: Proposed Rodmap Overview)

On 02/20/2012 09:24 AM, Olivier Delalleau wrote:
Hi Dag,
Would you mind elaborating a bit on that example you mentioned at the end of your email? I don't quite understand what behavior you would like to achieve
Sure, see below. I think we should continue discussion on numpy-discuss. I wrote:
You need at least a slightly different Python API to get anywhere, so numexpr/Theano is the right place to work on an implementation of this idea. Of course it would be nice if numexpr/Theano offered something as convenient as
with lazy: arr = A + B + C # with all of these NumPy arrays # compute upon exiting...
More information: The disadvantage today of using Theano (or numexpr) is that they require using a different API, so that one has to learn and use Theano "from the ground up", rather than just slap it on in an optimization phase. The alternative would require extensive changes to NumPy, so I guess Theano authors or Francesc would need to push for this. The alternative would be (with A, B, C ndarray instances): with theano.lazy: arr = A + B + C On __enter__, the context manager would hook into NumPy to override it's arithmetic operators. Then it would build a Theano symbolic tree instead of performing computations right away. In addition to providing support for overriding arithmetic operators, slicing etc., it would be necesarry for "arr" to be an ndarray instance which is "not yet computed" (data-pointer set to NULL, and store a compute-me callback and some context information). Finally, the __exit__ would trigger computation. For other operations which need the data pointer (e.g., single element lookup) one could either raise an exception or trigger computation. This is just a rough sketch. It is not difficult "in principle", but of course there's really a massive amount of work involved to work support for this into the NumPy APIs. Probably, we're talking a NumPy 3.0 thing, after the current round of refactorings have settled... Please: Before discussing this further one should figure out if there's manpower available for it; no sense in hashing out a castle in the sky in details. Also it would be better to talk in person about this if possible (I'm in Berkeley now and will attend PyData and PyCon). Dag

On Feb 20, 2012, at 6:46 PM, Dag Sverre Seljebotn wrote:
On 02/20/2012 09:24 AM, Olivier Delalleau wrote:
Hi Dag,
Would you mind elaborating a bit on that example you mentioned at the end of your email? I don't quite understand what behavior you would like to achieve
Sure, see below. I think we should continue discussion on numpy-discuss.
I wrote:
You need at least a slightly different Python API to get anywhere, so numexpr/Theano is the right place to work on an implementation of this idea. Of course it would be nice if numexpr/Theano offered something as convenient as
with lazy: arr = A + B + C # with all of these NumPy arrays # compute upon exiting...
More information:
The disadvantage today of using Theano (or numexpr) is that they require using a different API, so that one has to learn and use Theano "from the ground up", rather than just slap it on in an optimization phase.
The alternative would require extensive changes to NumPy, so I guess Theano authors or Francesc would need to push for this.
The alternative would be (with A, B, C ndarray instances):
with theano.lazy: arr = A + B + C
On __enter__, the context manager would hook into NumPy to override it's arithmetic operators. Then it would build a Theano symbolic tree instead of performing computations right away.
In addition to providing support for overriding arithmetic operators, slicing etc., it would be necesarry for "arr" to be an ndarray instance which is "not yet computed" (data-pointer set to NULL, and store a compute-me callback and some context information).
Finally, the __exit__ would trigger computation. For other operations which need the data pointer (e.g., single element lookup) one could either raise an exception or trigger computation.
This is just a rough sketch. It is not difficult "in principle", but of course there's really a massive amount of work involved to work support for this into the NumPy APIs.
Probably, we're talking a NumPy 3.0 thing, after the current round of refactorings have settled...
Please: Before discussing this further one should figure out if there's manpower available for it; no sense in hashing out a castle in the sky in details.
I see. Mark Wiebe already suggested the same thing some time ago: https://github.com/numpy/numpy/blob/master/doc/neps/deferred-ufunc-evaluatio...
Also it would be better to talk in person about this if possible (I'm in Berkeley now and will attend PyData and PyCon).
Nice. Most of Continuum crew (me included) will be attending to both conferences. Mark W. will make PyCon only, but will be a good occasion to discuss this further. See you, -- Francesc Alted

On 02/20/2012 10:04 AM, Francesc Alted wrote:
On Feb 20, 2012, at 6:46 PM, Dag Sverre Seljebotn wrote:
On 02/20/2012 09:24 AM, Olivier Delalleau wrote:
Hi Dag,
Would you mind elaborating a bit on that example you mentioned at the end of your email? I don't quite understand what behavior you would like to achieve
Sure, see below. I think we should continue discussion on numpy-discuss.
I wrote:
You need at least a slightly different Python API to get anywhere, so numexpr/Theano is the right place to work on an implementation of this idea. Of course it would be nice if numexpr/Theano offered something as convenient as
with lazy: arr = A + B + C # with all of these NumPy arrays # compute upon exiting...
More information:
The disadvantage today of using Theano (or numexpr) is that they require using a different API, so that one has to learn and use Theano "from the ground up", rather than just slap it on in an optimization phase.
The alternative would require extensive changes to NumPy, so I guess Theano authors or Francesc would need to push for this.
The alternative would be (with A, B, C ndarray instances):
with theano.lazy: arr = A + B + C
On __enter__, the context manager would hook into NumPy to override it's arithmetic operators. Then it would build a Theano symbolic tree instead of performing computations right away.
In addition to providing support for overriding arithmetic operators, slicing etc., it would be necesarry for "arr" to be an ndarray instance which is "not yet computed" (data-pointer set to NULL, and store a compute-me callback and some context information).
Finally, the __exit__ would trigger computation. For other operations which need the data pointer (e.g., single element lookup) one could either raise an exception or trigger computation.
This is just a rough sketch. It is not difficult "in principle", but of course there's really a massive amount of work involved to work support for this into the NumPy APIs.
Probably, we're talking a NumPy 3.0 thing, after the current round of refactorings have settled...
Please: Before discussing this further one should figure out if there's manpower available for it; no sense in hashing out a castle in the sky in details.
I see. Mark Wiebe already suggested the same thing some time ago:
https://github.com/numpy/numpy/blob/master/doc/neps/deferred-ufunc-evaluatio...
Thanks, I didn't know about that (though I did really assume this was on Mark's radar already).
Also it would be better to talk in person about this if possible (I'm in Berkeley now and will attend PyData and PyCon).
Nice. Most of Continuum crew (me included) will be attending to both conferences. Mark W. will make PyCon only, but will be a good occasion to discuss this further.
I certainly don't think I have anything to add to this discussion beyond what Mark wrote up. But will be nice to meet up anyway. Dag
participants (2)
-
Dag Sverre Seljebotn
-
Francesc Alted