numpy performance and random numbers
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Sat Dec 19 19:23:46 EST 2009
On Sat, 19 Dec 2009 09:02:38 -0800, Carl Johan Rehn wrote:
> Well, in Matlab I used "tic; for i = 1:1000, randn(100, 10000), end;
> toc" and in IPython i used a similar construct but with "time" instead
> of tic/(toc.
I don't know if this will make any significant difference, but for the
record that is not the optimal way to time a function in Python due to
the overhead of creating the loop sequence.
In Python, the typical for-loop construct is something like this:
for i in range(1, 1000)
but range isn't a funny way of writing "loop over these values", it
actually creates a list of integers, which has a measurable cost. In
Python 2.x you can reduce that cost by using xrange instead of range, but
it's even better to pre-compute the list outside of the timing code. Even
better still is to use a loop sequence like [None]*1000 (precomputed
outside of the timing code naturally!) to minimize the cost of memory
accesses.
The best way to perform timings of small code snippets in Python is using
the timeit module, which does all these things for you. Something like
this:
from timeit import Timer
t = Timer('randn(100, 10000)', 'from numpy import randn')
print min(t.repeat())
This will return the time taken by one million calls to randn. Because of
the nature of modern operating systems, any one individual timing can be
seriously impacted by other processes, so it does three independent
timings and returns the lowest. And it will automatically pick the best
timer to use according to your operating system (either clock, which is
best on Windows, or time, which is better on Posix systems).
--
Steven
More information about the Python-list
mailing list