Hi, Sorry if this is a double post - unsure if it made it the first time: Here are some timings that puzzle me a bit. Sum the two rows of a 2xn matrix, where n is some large number python -m timeit -s "from numpy import array,sum,reshape; x = array([1.5]*1000); x = reshape(x,(2,500))" "x.sum(0)" 10000 loops, best of 3: 36.2 usec per loop python -m timeit -s "from numpy import array,sum,reshape; x = array([1.5]*1000); x = reshape(x,(2,500))" "x[0] + x[1]" 100000 loops, best of 3: 5.35 usec per loop The two calculations are completely equivalent (at least from the numerical results point of view). But using the built-in sum() function is app. 6 times slower. Just for the reference - using Numeric gives python -m timeit -s "from Numeric import array,sum,reshape; x = array([1.5]*1000); x = reshape(x,(2,500))" "sum(x)" 100000 loops, best of 3: 7.08 usec per loop Any suggestions to why this might be - or am I doing something wrong here? Thanks // Mads PS. Thanks to everybody that responded to my previous posting on problems and questions regarding rint() and related coding issues. Your replies were very helpful!
Mads Ipsen wrote:
Hi,
Sorry if this is a double post - unsure if it made it the first time:
Here are some timings that puzzle me a bit. Sum the two rows of a 2xn matrix, where n is some large number
python -m timeit -s "from numpy import array,sum,reshape; x = array([1.5]*1000); x = reshape(x,(2,500))" "x.sum(0)" 10000 loops, best of 3: 36.2 usec per loop
python -m timeit -s "from numpy import array,sum,reshape; x = array([1.5]*1000); x = reshape(x,(2,500))" "x[0] + x[1]" 100000 loops, best of 3: 5.35 usec per loop
This is probably reasonable. There is overhead in the looping construct (basically what happens is that the first element is copied into the output and then a function called --- which in this case has a loop of size 1 to compute the sum). This is then repeated 500 times. So, you have 500 C-function pointer calls in the first case. In the second case you have basically a single call to the same function where the 500-element loop is done. I'm a little surprised that Numeric is so much faster for this case as you show later. The sum code is actually add.reduce... which uses a generic reduction concept in ufuncs. It has overhead over what you might do using some less general approach. If anyone can figure out how to make the NOBUFFER secion in GenericReduce faster in ufuncobject.c it will be greatly welcomed. Speed improvements are always welcome. -Travis
On 3/7/06, Travis Oliphant <oliphant@ee.byu.edu> wrote:
... I'm a little surprised that Numeric is so much faster for this case as you show later.
Here is another puzzle:
python -m timeit -s "from Numeric import zeros,sum; x = zeros((2,500),'f')" "sum(x,0)" 100000 loops, best of 3: 6.05 usec per loop python -m timeit -s "from Numeric import zeros,sum; x = zeros((500,2),'f')" "sum(x,1)" 10000 loops, best of 3: 26.8 usec per loop
python -m timeit -s "from numpy import zeros; x = zeros((2,500),'f')" "x.sum(0)" 10000 loops, best of 3: 22.8 usec per loop python -m timeit -s "from numpy import zeros; x = zeros((500,2),'f')" "x.sum(1)" 10000 loops, best of 3: 23.6 usec per loop
Numpy and Numeric perform very similarly when reducing along the second axis.
If anyone can figure out how to make the NOBUFFER secion in GenericReduce faster in ufuncobject.c it will be greatly welcomed.
Here is my $.02: <http://projects.scipy.org/scipy/numpy/wiki/PossibleOptimizationAreas/ReduceDiscussion> BTW, I've tried to take loop->... redirections out of the loop: no effect with gcc -O3 .
On 3/7/06, Travis Oliphant <oliphant@ee.byu.edu> wrote:
... I'm a little surprised that Numeric is so much faster for this case as you show later.
Here is another puzzle:
python -m timeit -s "from Numeric import zeros,sum; x = zeros((2,500),'f')" "sum(x,0)" 100000 loops, best of 3: 6.05 usec per loop python -m timeit -s "from Numeric import zeros,sum; x = zeros((500,2),'f')" "sum(x,1)" 10000 loops, best of 3: 26.8 usec per loop
python -m timeit -s "from numpy import zeros; x = zeros((2,500),'f')" "x.sum(0)" 10000 loops, best of 3: 22.8 usec per loop python -m timeit -s "from numpy import zeros; x = zeros((500,2),'f')" "x.sum(1)" 10000 loops, best of 3: 23.6 usec per loop
Numpy and Numeric perform very similarly when reducing along the second axis.
If anyone can figure out how to make the NOBUFFER secion in GenericReduce faster in ufuncobject.c it will be greatly welcomed.
Here is my $.02: <http://projects.scipy.org/scipy/numpy/wiki/PossibleOptimizationAreas/ReduceDiscussion> BTW, I've tried to take loop->... redirections out of the loop: no effect with gcc -O3 .
participants (4)
-
Alexander Belopolsky
-
Mads Ipsen
-
Sasha
-
Travis Oliphant