On Thu, 2005-01-13 at 10:24 -0800, Ralf Juengling wrote:
Hi,
I have an application where I cannot avoid (afaikt) looping over one array dimension. So I thought it might help speeding up the code by setting up the data in a way so that the dimension to loop over is the first dimension. This allows to write
for data in a: do sth with data
instead of
for i in range(len(a)): data = a[i] do sth with data
and would save the indexing operation. To my surprise it didn't make a difference in terms of speed. A little timing experiment suggests, that the first version is actually slightly slower than the second:
setup = 'import numarray as na; a = na.arange(2000,shape=(1000,2))'
Timer('for row in a: pass', setup).timeit(number=1000) 13.495718955993652
Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) 12.162748098373413
I noticed that the array object does not have a special method __iter__, so apparently, no attempts have been made so far to make array iteration fast. Do you think it's possible to speed things up by implementing an __iter__ method?
I'm skeptical. My impression is that the fallback for the iteration system is to use the object's len() to determine the count and its getitem() to fetch the iteration elements, all in C without intermediate indexing objects. If numarray is to be sped up, I think the key is to speed up the indexing code and/or object creation code in numarray's _ndarraymodule.c and _numarraymodule.c. I'd be happy to be proved wrong but that's my 2 cents. Regards, Todd