Mailman 3 iterating over an array - NumPy-Discussion

iterating over an array

older
Re: [Numpy-discussion] iterating...

Ralf Juengling

13 Jan 2005 13 Jan '05

2:26 a.m.

Hi, I have an application where I cannot avoid (afaikt) looping over one array dimension. So I thought it might help speeding up the code by setting up the data in a way so that the dimension to loop over is the first dimension. This allows to write for data in a: do sth with data instead of for i in range(len(a)): data = a[i] do sth with data and would save the indexing operation. To my surprise it didn't make a difference in terms of speed. A little timing experiment suggests, that the first version is actually slightly slower than the second:

...

...
...
setup = 'import numarray as na; a = na.arange(2000,shape=(1000,2))'

...

...
...
Timer('for row in a: pass', setup).timeit(number=1000) 13.495718955993652

...

...
...
Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) 12.162748098373413

I noticed that the array object does not have a special method __iter__, so apparently, no attempts have been made so far to make array iteration fast. Do you think it's possible to speed things up by implementing an __iter__ method? This is high on my wish list and I would help with implementing it, appreciating any advice. Thanks, Ralf

Show replies by date

Todd Miller

13 Jan 13 Jan

5:07 a.m.

On Thu, 2005-01-13 at 10:24 -0800, Ralf Juengling wrote:

...

Hi,

I have an application where I cannot avoid (afaikt) looping over one array dimension. So I thought it might help speeding up the code by setting up the data in a way so that the dimension to loop over is the first dimension. This allows to write

for data in a: do sth with data

instead of

for i in range(len(a)): data = a[i] do sth with data

and would save the indexing operation. To my surprise it didn't make a difference in terms of speed. A little timing experiment suggests, that the first version is actually slightly slower than the second:

...
...
...
setup = 'import numarray as na; a = na.arange(2000,shape=(1000,2))'

...
...
...
Timer('for row in a: pass', setup).timeit(number=1000) 13.495718955993652

...
...
...
Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) 12.162748098373413

I noticed that the array object does not have a special method __iter__, so apparently, no attempts have been made so far to make array iteration fast. Do you think it's possible to speed things up by implementing an __iter__ method?

I'm skeptical. My impression is that the fallback for the iteration system is to use the object's len() to determine the count and its getitem() to fetch the iteration elements, all in C without intermediate indexing objects. If numarray is to be sped up, I think the key is to speed up the indexing code and/or object creation code in numarray's _ndarraymodule.c and _numarraymodule.c. I'd be happy to be proved wrong but that's my 2 cents. Regards, Todd

Chris Barker

7:03 a.m.

Ralf Juengling wrote:

...

for data in a: do sth with data

instead of

for i in range(len(a)): data = a[i] do sth with data

...

Do you think it's possible to speed things up by implementing an __iter__ method?

Frankly, I seriously doubt it would make much difference, the indexing operation would have to take a comparable period of time to your: do sth with data That is unlikely. By the way, here is a test with Python lists: setup = 'import numarray as na; a = [[i*2,i*2+1] for i in range(1000)]'

...

...
...
Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) 0.37136483192443848

Much faster than the numarray examples( ~ 30 on my machine). I suspect the real delay here is that each indexing operation has to create a new array (even if they do use the same data). Lists just return the item. Also, it's been discussed that numarray's generic indexing is much slower than Numeric's, for instance. This has made a huge difference when passing arrays into wxPython, for instance. Perhaps that's relevant? Here's a test with Numeric vs. numarray:

...

...
...
setup = 'import Numeric as na; a = na.arange(2000);a.shape=(1000,2)' Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) 1.97064208984375 setup = 'import numarray as na; a = na.arange(2000);a.shape=(1000,2)' Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) 27.220904111862183

yup! that's it. numarray's indexing is SLOW. So it's not an iterator issue. Look in the archives of this list for discussion of why numarray's generic indexing is slow. A search for "wxPython indexing" will probably turn it up. -Chris -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Francesc Altet

4:29 p.m.

A Dijous 13 Gener 2005 23:57, Chris Barker va escriure:

...

...
...
...
setup = 'import Numeric as na; a = na.arange(2000);a.shape=(1000,2)' Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) 1.97064208984375 setup = 'import numarray as na; a = na.arange(2000);a.shape=(1000,2)' Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) 27.220904111862183

yup! that's it. numarray's indexing is SLOW. So it's not an iterator issue. Look in the archives of this list for discussion of why numarray's generic indexing is slow. A search for "wxPython indexing" will probably turn it up.

Well, if you want to really compare generic indexing speed, you can't mix array creation objects in the process, as your example seems to do. A pure indexing access test would look like:

...

...
...
setup = 'import numarray as na; a = [i*2 for i in range(2000)]' Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) 0.48835396766662598 # With Python Lists setup = 'import Numeric as na; a = na.arange(2000);a.shape=(1000*2,)' Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) 0.65753912925720215 # With Numeric setup = 'import numarray as na; a = na.arange(2000);a.shape=(1000*2,)' Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) 0.89093804359436035 # With numarray

That shows that numarray indexing is slower than Numeric, but not by a large extent (just a 40%). The real problem with numarray (for Ralf's example) is, as is already known, array creation time. Cheers, --

...

OO< Francesc Altet || http://www.carabos.com/ V V Carabos Coop. V. || Who is your data daddy? PyTables ""

Chris Barker

14 Jan 14 Jan

3:54 a.m.

Francesc Altet wrote:

...

That shows that numarray indexing is slower than Numeric, but not by a large extent (just a 40%). The real problem with numarray (for Ralf's example) is, as is already known, array creation time.

Thanks for clearing this up. The case I care about(at the moment) is in wxPython's "PointListHelper". It converts whatever Python sequence you give it into a wxList of wxPoints. The sequence you give it needs to look something like a list of (x,y) tuples. An NX2 Numeric or Numarray array works just fine, but both are slower than a list of tuples, and Numarray is MUCH slower. This appears to be exactly analogous the OP's example, of extracting a bunch of (2,) arrays from the (N,2) array. Then the two numbers must be extracted from the (2,) array, and then converted to a wxPoint. It seems the creation of all those (2,) numarrays is what's taking the time. A) Is there work going on on speeding this up? B) the real solution, at least for wxPython, is to make "PointListHelper" understand numarrays, so that it can go straight from the array->data pointer to the wxList of wxPoints. One of these days I'll get around to working on that! -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

7238

Age (days ago)

7239

Last active (days ago)

List overview

Download

4 comments

4 participants

participants (4)

Chris Barker
Francesc Altet
Ralf Juengling
Todd Miller

iterating over an array

Ralf Juengling

Todd Miller

Chris Barker

Francesc Altet

Chris Barker

tags

participants (4)