Mads Ipsen wrote:
Hi,
Sorry if this is a double post - unsure if it made it the first time:
Here are some timings that puzzle me a bit. Sum the two rows of a 2xn matrix, where n is some large number
python -m timeit -s "from numpy import array,sum,reshape; x = array([1.5]*1000); x = reshape(x,(2,500))" "x.sum(0)" 10000 loops, best of 3: 36.2 usec per loop
python -m timeit -s "from numpy import array,sum,reshape; x = array([1.5]*1000); x = reshape(x,(2,500))" "x[0] + x[1]" 100000 loops, best of 3: 5.35 usec per loop
This is probably reasonable. There is overhead in the looping construct (basically what happens is that the first element is copied into the output and then a function called --- which in this case has a loop of size 1 to compute the sum). This is then repeated 500 times. So, you have 500 C-function pointer calls in the first case. In the second case you have basically a single call to the same function where the 500-element loop is done. I'm a little surprised that Numeric is so much faster for this case as you show later. The sum code is actually add.reduce... which uses a generic reduction concept in ufuncs. It has overhead over what you might do using some less general approach. If anyone can figure out how to make the NOBUFFER secion in GenericReduce faster in ufuncobject.c it will be greatly welcomed. Speed improvements are always welcome. -Travis