[Numpy-discussion] Fwd: GPU Numpy

Thu Aug 6 13:41:32 EDT 2009

On Thu, Aug 6, 2009 at 1:19 PM, Charles R
Harris<charlesr.harris at gmail.com> wrote:
> I almost looks like you are reimplementing numpy, in c++ no less. Is there
> any reason why you aren't working with a numpy branch and just adding
> ufuncs?

I don't know how that would work.  The Ufuncs need a datatype to work
with, and AFAIK, it would break everything if a numpy ndarray pointed
to memory on the GPU.  Could you explain what you mean a little more?

> I'm also curious if you have thoughts about how to use the GPU
> pipelines in parallel.

Current thinking for ufunc type computations:
1) divide up the tensors into subtensors whose dimensions have
power-of-two sizes (this permits a fast integer -> ndarray coordinate
computation using bit shifting),
2) launch a kernel for each subtensor in it's own stream to use
parallel pipelines.
3) sync and return.

This is a pain to do without automatic code generation though.
Currently we're using macros, but that's not pretty.
C++ has templates, which we don't really use yet, but were planning on
using.  These have some power to generate code.
The 'theano' project (www.pylearn.org/theano) for which cuda-ndarray
was created has a more powerful code generation mechanism similar to
weave.   This algorithm is used in theano-cuda-ndarray.
Scipy.weave could be very useful for generating code for specific
shapes/ndims on demand, if weave could use nvcc.

James