[Numpy-discussion] Fwd: GPU Numpy

Thu Aug 6 18:12:11 EDT 2009

On Thu, Aug 6, 2009 at 3:29 PM, James Bergstra <bergstrj at iro.umontreal.ca>wrote:

> On Thu, Aug 6, 2009 at 4:57 PM, Sturla Molden<sturla at molden.no> wrote:
> >
> >> Now linear algebra or FFTs on a GPU would probably be a huge boon,
> >> I'll admit - especially if it's in the form of a drop-in replacement
> >> for the numpy or scipy versions.
> >
> >
> > NumPy generate temporary arrays for expressions involving ndarrays. This
> > extra allocation and copying often takes more time than the computation.
> > With GPGPUs, we have to bus the data to and from VRAM as well. D. Knuth
> > quoted Hoare saying that "premature optimization is the root of all
> > evil." Optimizing computation when the bottleneck is memory is premature.
> >
> > In order to improve on this, I think we have to add "lazy evaluation" to
> > NumPy. That is, an operator should not return a temporary array but a
> > symbolic expression. So if we have an expression like
> >
> >    y = a*x + b
> >
> > it should not evalute a*x into a temporary array. Rather, the operators
> > would build up a "parse tree" like
> >
> >    y = add(multiply(a,x),b)
> >
> > and evalute the whole expression  later on.
> [snip]
> > Regards,
> > Sturla Molden
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
>
> Hi Sturla,
>
> The plan you describe is a good one, and Theano
> (www.pylearn.org/theano) almost exactly implements it.  You should
> check it out.  It does not use 'with' syntax at the moment, but it
> could provide the backend machinery for your mechanism if you want to
> go forward with that.  Theano provides
> - symbolic expression building for a big subset of what numpy can do
> (and a few things that it doesn't)
> - expression optimization (for faster and more accurate computations)
> - dynamic code generation
> - cacheing of compiled functions to disk.
>
> Also, when you have a symbolic expression graph you can do cute stuff
> like automatic differentiation.  We're currently working on the bridge
> between theano and cuda so that you declare certain inputs as residing
> on the GPU instead of the host memory, so you don't have to transfer
> things to and from host memory as much.
>

So what simple things could numpy implement that would help here? It almost
sounds like numpy would mostly be an interface to python and the gpu would
execute specialized code written and compiled for specific problems. Whether
the code that gets compiled is written using lazy evaluation (ala Sturla),
or is expressed some other way seems like an independent issue. It sounds
like one important thing would be having arrays that reside on the GPU.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20090806/06ce5e84/attachment.html>