I've been having some fun working on a (tiny) rpython version of numpy. At the heart of numpy is a multi-dimensional array type, ndarray. For example it (the cpython version) can handle processing image arrays like so:
import numpy from PIL import Image im=Image.open('jit.png') data=numpy.array(im)
now data is a 3 dimensional array, data.ndim==3 and data.shape==(1158, 1173, 3). The last dimension is the color dimension. The cool part of numpy is all the indexing tricks. This is a single pixel:
data[0,0,:] array([224, 255, 224], dtype=uint8)
this is the red channel:
data[:,:,0] array([[224, 224, 224, ..., 224, 224, 224], ...
now we mix red and blue:
data[:,:,0]+data[:,:,2]
(this actually didn't work so well because it's an array of uint8 and we get overflow, but that can be fixed with a cast) The trick with numpy is, every time we use an indexing operation it creates a "view" onto the original data, ie. the memory is aliased. Each ndarray stores a list of "strides" (integers) which specify how many bytes to skip when incrementing an index in a dimension:
data.strides (3519, 3, 1)
data[:,:,0].strides (3519, 3)
Then numpy (at the c level) uses iterators to perform getattr, setattr, and other (pointwise) operations. "broadcasting" allows a smaller array to be used as a larger array, as long as the shape of the smaller array is a suffix of the shape of the larger.
green = numpy.array([0,255,0], dtype=numpy.uint8) data[:] = green This sets every pixel to green.
I made a design choice with rnumpy to specialize on the number of dimensions (ndim) of an array. This means that we can generate efficient code (more about this below); which was just too tempting to ignore. But also, * it would be impossible to access arbitrary ndim arrays from a pypy wrapped rnumpy. * the shape attribute is read-only. To change the shape of an array, use the reshape method (this does not copy the data):
numpy.array(range(12)).reshape((3,4)) array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])
The above mentioned "efficient code" is perhaps a non-issue. Numpy code tends to become memory-bound, not cpu-bound. However it is an interesting use of the rpython optimiser. In general, to iterate over an n-dimensional array requires n-nested for loops, or the use of a "coordinate" array. In this implementation (as in the c implementation) there is an iterator object to hold the coordinates (and the corresponding strides). With malloc removal, all this iterator data become local variables. It's nifty. I don't really know where this is all going; an obvious goal would be to extend all this with a variable dimension array type (and write an applevel numpy). One of the original ideas was to somehow produce cache friendly code by doing lazy operations. Another issue is that it has been mind-boggleingly difficult to directly extend the compiler. I would like to rethink this whole approach; perhaps special methods can help, or some other "low-level to high-level" interaction. Simon.
Simon Burton wrote:
I've been having some fun working on a (tiny) rpython version of numpy. That's great, this is really one of the most interesting part of pypy for me.
The above mentioned "efficient code" is perhaps a non-issue. Numpy code tends to become memory-bound, not cpu-bound. I tend to agree, at least from my limited observation and hacking on numpy: some (quite fundamental) operations are not cache friendly at all. But I thought this was what rpython + pypy could improve quite a lot, actually. For example, one problem with numpy, specially with broadcasting and the ufunc machinery, is to decide on which dimension to compute for maximum performances. See for example http://www.mail-archive.com/numpy-discussion@scipy.org/msg03759.html.
David
participants (2)
-
David Cournapeau
-
Simon Burton