[Numpy-discussion] Fwd: GPU Numpy

Thu Sep 10 00:04:24 EDT 2009

George Dahl skrev:
 > I know that for my work, I can get around an order of a 50-fold 
speedup over
 > numpy using a python wrapper for a simple GPU matrix class.  So I 
might be
 > dealing with a lot of matrix products where I multiply a fixed 512 by 
784 matrix
 > by a 784 by 256 matrix that changes between each matrix product, 
although to
 > really see the largest gains I use a 4096 by 2048 matrix times a 
bunch of 2048
 > by 256 matrices.

Matrix multiplication is at the core of 3D graphics, and the raison 
d'etre for GPUs. That is specifically what they are designed to do. 
Matrix multiplication scale O(n**3) with floating point operations and 
O(n**2) with memory access. That is GPUs gives fast 3D graphics (matrix 
multiplications) by speeding up floating point operations.

GPUs makes sence for certain level-3 BLAS calls, but that really belongs 
in BLAS, not in NumPy's core. One could e.g. consider linking with a 
BLAS wrapper that directs these special cases to the GPU and the rest to 
ATLAS / MKL / netlib BLAS.

Sturla Molden