[Numpy-discussion] strange sin/cos performance

Tue Aug 4 09:39:15 EDT 2009

Bruce Southey wrote:
> Hi,
> Can you try these from the command line:
> python -m timeit -n 100 -s "import numpy as np; a = np.arange(0.0, 1000, 
> (2*3.14159) / 1000, dtype=np.float32)"
> python -m timeit -n 100 -s "import numpy as np; a = np.arange(0.0, 1000, 
> (2*3.14159) / 1000, dtype=np.float32); b=np.sin(a)"
> python -m timeit -n 100 -s "import numpy as np; a = np.arange(0.0, 1000, 
> (2*3.14159) / 1000, dtype=np.float32); np.sin(a)"
> python -m timeit -n 100 -s "import numpy as np; a = np.arange(0.0, 1000, 
> (2*3.14159) / 1000, dtype=np.float32)" "np.sin(a)"
> 
> The first should be similar for different dtypes because it is just 
> array creation. The second extends that by storing the sin into another 
> array. I am not sure how to interpret the third but in the Python prompt 
> it would print it to screen. The last causes Python to handle two 
> arguments which is slow using float32 but not for float64 and float128 
> suggesting compiler issue such as not using SSE or similar.

Results:

$ python -m timeit -n 100 -s "import numpy as np; a = np.arange(0.0, 
1000, (2*3.14159) / 1000, dtype=np.float32)"
100 loops, best of 3: 0.0811 usec per loop

$ python -m timeit -n 100 -s "import numpy as np; a = np.arange(0.0, 
1000, (2*3.14159) / 1000, dtype=np.float32); b=np.sin(a)"
100 loops, best of 3: 0.11 usec per loop

$ python -m timeit -n 100 -s "import numpy as np; a = np.arange(0.0, 
1000, (2*3.14159) / 1000, dtype=np.float32); np.sin(a)"
100 loops, best of 3: 0.11 usec per loop

$ python -m timeit -n 100 -s "import numpy as np; a = np.arange(0.0, 
1000, (2*3.14159) / 1000, dtype=np.float32)" "np.sin(a)"
100 loops, best of 3: 112 msec per loop

$ python -m timeit -n 100 -s "import numpy as np; a = np.arange(0.0, 
1000, (2*3.14159) / 1000, dtype=np.float64)" "np.sin(a)"
100 loops, best of 3: 13.2 msec per loop

I think the second and third are effectively the same; both create an 
array containing the result.  The second assigns that array to a value, 
while the third does not, so it should get garbage collected.

The fourth one is the only one that actually runs the sin in the timing 
loop.  I don't understand what you mean by causing Pyton to handle two 
arguments?

The fifth run I added uses float64 to compare (and reproduces the problem).

Andrew