Slow element-by-element access?

A colleague showed me a program using Numeric with Python 2.5 which ran much faster than the same program using numpy with Python 2.7. I distilled this down to a simple test case, characterized by a "for" loop in which he does an element-by-element calculation involving arrays:
from numpy import arange # or from Numeric import arange from time import clock # Numeric 0.24 seconds; 15 times as fast as numpy # numpy 3.6 seconds N = 100000 a = arange(N) b = arange(N) t = clock() for i in range(1,N-1): pass tpass = clock()-t t = clock() for i in range(1,N-1): b[i] = a[i]-t*(a[i+1]-2*a[i]+a[i-1])+a[i]*a[i]*t t = clock()-t print t-tpass
His calculation b[i] = a[i]-t*(a[i+1]-2*a[i]+a[i-1])+a[i]*a[i]*t is 15 times faster with Numeric than with numpy.
It is of course the case that he should have done a single array calculation rather than use a "for" loop, and I've explained that to him. The array calculation runs at about the same speed in numpy as in Numeric.
Nevertheless, I'm surprised that the element-by-element calculation is so very slow, and surely there are situations where individual element access is important in a numpy calculation. Is this a known issue? Does it matter? I was unable to find any discussion of this.
Bruce Sherwood

On Sun, Nov 21, 2010 at 11:17 PM, Bruce Sherwood basherwo@ncsu.edu wrote:
A colleague showed me a program using Numeric with Python 2.5 which ran much faster than the same program using numpy with Python 2.7. I distilled this down to a simple test case, characterized by a "for" loop in which he does an element-by-element calculation involving arrays:
from numpy import arange # or from Numeric import arange from time import clock # Numeric 0.24 seconds; 15 times as fast as numpy # numpy 3.6 seconds N = 100000 a = arange(N) b = arange(N) t = clock() for i in range(1,N-1): pass tpass = clock()-t t = clock() for i in range(1,N-1): b[i] = a[i]-t*(a[i+1]-2*a[i]+a[i-1])+a[i]*a[i]*t t = clock()-t print t-tpass
His calculation b[i] = a[i]-t*(a[i+1]-2*a[i]+a[i-1])+a[i]*a[i]*t is 15 times faster with Numeric than with numpy.
It is of course the case that he should have done a single array calculation rather than use a "for" loop, and I've explained that to him. The array calculation runs at about the same speed in numpy as in Numeric.
Nevertheless, I'm surprised that the element-by-element calculation is so very slow, and surely there are situations where individual element access is important in a numpy calculation. Is this a known issue? Does it matter? I was unable to find any discussion of this.
Yes, indexing is known to be slow, although I don't recall the precise reason for that. Something to do with way integers are handled or some such. There was some discussion on the list many years ago...
Chuck

Sun, 21 Nov 2010 23:26:37 -0700, Charles R Harris wrote: [clip]
Yes, indexing is known to be slow, although I don't recall the precise reason for that. Something to do with way integers are handled or some such. There was some discussion on the list many years ago...
It could be useful if someone spent time on profiling the call overhead for integer indexing. From what I remember from that part of the code, I'm fairly sure it can be optimized.

On Mon, Nov 22, 2010 at 2:30 AM, Pauli Virtanen pav@iki.fi wrote:
Sun, 21 Nov 2010 23:26:37 -0700, Charles R Harris wrote: [clip]
Yes, indexing is known to be slow, although I don't recall the precise reason for that. Something to do with way integers are handled or some such. There was some discussion on the list many years ago...
It could be useful if someone spent time on profiling the call overhead for integer indexing. From what I remember from that part of the code, I'm fairly sure it can be optimized.
I was thinking the same. Profiling the code could be very useful.
Chuck

Basically, indexing in Python is a little slower, the number of things indexing can do is more varied, and more to the point, the objects returned from arrays are NumPy scalars (which have math which is not optimized).
If you do element-by-element indexing, it's generally best to use Python lists (this has always been true, even with Numeric). The item method on NumPy arrays will speed up these kind of loops:
from numpy import arange from time import clock
N = 100000 a = arange(N) b = arange(N) al = a.tolist() bl = b.tolist() t = clock() for i in range(1, N-1): pass tpass = clock() - t t = clock() ai = a.item for i in range(1, N-1): val = ai(i) - t*(ai(i+1) - 2*ai(i) + ai(i-1)) + ai(i)*ai(i)*t b.itemset(i, val % (2**31)) tfast = clock() - t t = clock() for i in range(1, N-1): val = a[i] - t*(a[i+1] - 2*a[i] + a[i-1]) + a[i]*a[i]*t b[i] = val % (2**31) tslow = clock() - t
t = clock() for i in range(1, N-1): val = al[i] - t*(al[i+1] - 2*al[i] + al[i-1]) + al[i]*al[i]*t bl[i] = val % (2**31) tlist = clock() - t
print tpass, tfast, tslow, tlist
On my system, the list version is the fastest, while the itemset method is about 10x faster than the full indexing approach. The item method not only does faster indexing, but it also returns Python scalars rather than NumPy scalars.
-Travis
participants (4)
-
Bruce Sherwood
-
Charles R Harris
-
Pauli Virtanen
-
Travis Oliphant