[Numpy-discussion] Fwd: GPU Numpy

Thu Aug 6 17:29:07 EDT 2009

On Thu, Aug 6, 2009 at 4:57 PM, Sturla Molden<sturla at molden.no> wrote:
>
>> Now linear algebra or FFTs on a GPU would probably be a huge boon,
>> I'll admit - especially if it's in the form of a drop-in replacement
>> for the numpy or scipy versions.
>
>
> NumPy generate temporary arrays for expressions involving ndarrays. This
> extra allocation and copying often takes more time than the computation.
> With GPGPUs, we have to bus the data to and from VRAM as well. D. Knuth
> quoted Hoare saying that "premature optimization is the root of all
> evil." Optimizing computation when the bottleneck is memory is premature.
>
> In order to improve on this, I think we have to add "lazy evaluation" to
> NumPy. That is, an operator should not return a temporary array but a
> symbolic expression. So if we have an expression like
>
>    y = a*x + b
>
> it should not evalute a*x into a temporary array. Rather, the operators
> would build up a "parse tree" like
>
>    y = add(multiply(a,x),b)
>
> and evalute the whole expression  later on.
[snip]
> Regards,
> Sturla Molden
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

Hi Sturla,

The plan you describe is a good one, and Theano
(www.pylearn.org/theano) almost exactly implements it.  You should
check it out.  It does not use 'with' syntax at the moment, but it
could provide the backend machinery for your mechanism if you want to
go forward with that.  Theano provides
- symbolic expression building for a big subset of what numpy can do
(and a few things that it doesn't)
- expression optimization (for faster and more accurate computations)
- dynamic code generation
- cacheing of compiled functions to disk.

Also, when you have a symbolic expression graph you can do cute stuff
like automatic differentiation.  We're currently working on the bridge
between theano and cuda so that you declare certain inputs as residing
on the GPU instead of the host memory, so you don't have to transfer
things to and from host memory as much.

James