[Numpy-discussion] iterating over an array

Thu Jan 13 13:07:12 EST 2005

On Thu, 2005-01-13 at 10:24 -0800, Ralf Juengling wrote:
> Hi,
> 
> I have an application where I cannot avoid (afaikt) 
> looping over one array dimension. So I thought it 
> might help speeding up the code by setting up the
> data in a way so that the dimension to loop over is 
> the first dimension. This allows to write
> 
> for data in a:
>    do sth with data
> 
> instead of 
> 
> for i in range(len(a)):
>    data = a[i]   
>    do sth with data
> 
> and would save the indexing operation. To my surprise
> it didn't make a difference in terms of speed. A
> little timing experiment suggests, that the first
> version is actually slightly slower than the second:
> 
> >>> setup = 'import numarray as na; a = na.arange(2000,shape=(1000,2))'
> 
> >>> Timer('for row in a: pass', setup).timeit(number=1000)
> 13.495718955993652
> 
> >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000)
> 12.162748098373413
> 
> 
> I noticed that the array object does not have a special
> method __iter__, so apparently, no attempts have been
> made so far to make array iteration fast. Do you think 
> it's possible to speed things up by implementing an 
> __iter__ method? 

I'm skeptical.  My impression is that the fallback for the iteration
system is to use the object's len() to determine the count and its
getitem() to fetch the iteration elements, all in C without intermediate
indexing objects.

If numarray is to be sped up,  I think the key is to speed up the
indexing code and/or object creation code in numarray's _ndarraymodule.c
and _numarraymodule.c.  I'd be happy to be proved wrong but that's my 2
cents.

Regards,
Todd