[Tutor] List comprehension question
Steven D'Aprano
steve at pearwood.info
Wed Nov 10 11:35:42 CET 2010
Richard D. Moores wrote:
> On Tue, Nov 9, 2010 at 12:54, Steven D'Aprano <steve at pearwood.info> wrote:
>> Richard D. Moores wrote:
>>
>>> See <http://tutoree7.pastebin.com/R82876Eg> for a speed test with n =
>>> 100,000 and 100,000 loops
>> As a general rule, you shouldn't try to roll your own speed tests. There are
>> various subtleties that can throw your results right out. Timing small code
>> snippets on modern multi-tasking computers is fraught with difficulties.
>
> Yes, but I thought that if I ran the tests the way I did, several
> times and then reversing the order of the functions, and seeing that
> the results were very similar, I could be pretty sure of my findings.
> Could you point out where taking even that amount of care could lead
> me astray?
Well, the biggest problem is that you're re-inventing the wheel. But in
truth, what you did was not terrible, particularly if you're using
Python 3 where range() is quite lightweight rather than a heavy
operation that creates a big list.
There are various traps when timing code:
* There are commonly two functions you use for getting the time:
time.time() and time.clock(). On Windows, the best one to use is
clock(), but on other operating systems, you should probably use time().
The timeit module automatically chooses the right one for you.
* Comparing apples and oranges. There's not a lot that timeit can do
there, since it requires you *understand* what the code is that you are
timing, but it helps by allowing you to split initialisation away from
the code you are timing.
* Timing too much. One trap is to do this:
t = time.time()
for _ in range(1000000):
do_something()
t = time.time() - t
This is especially bad in Python 2.x!
The problem is that you are including the time it takes to create a list
of one million integers in your timing code. That's like starting the
stop watch at a race while the athletes are still in the changing rooms,
getting dressed, and yet you'd be amazed how often people do it! If the
time taken by each call to do_something() is very large, then the extra
time will be negligible, but if do_something is a fast snippet, it might
take you nearly as long to build the list as it does to run the code
you're trying to time!
timeit ensures that as little as possible overhead is included in the
execution being timed.
* Forgetting that computers are multi-tasking machines these days. At
every instant, the operating system is giving tiny time-slices to your
program and potentially dozens of others. If you're busy loading a dozen
web pages, playing a DVD, compiling a large program and trying to time a
code snippet, it will be obvious that everything is going to be running
slow. But if you time something *once*, what's not obvious is that just
by chance you might get a number that is much slower than normal because
the OS *just happened* to call your anti-virus during the test. timeit
fights against that by running your code in a loop, and then working out
the average time per loop, *and* then repeating the loop multiple times
(three by default) and picking the lowest figure. Higher figures are
slower due to external factors.
* Forgetting your CPU cache effects. Modern CPUs do all sorts of amazing
wizardry to make code run fast. Sometimes slow, but hopefully more often
fast. The first time you run a piece of code, chances are it will be
slower than normal, because the hardware caches will have other stuff in
them. But the second time, the code and data will already be lined up in
the cache and pipelines ready to go. This is why if you time something a
dozen times, the *first* time could be two or three times slower than
the rest.
I've probably forgotten some... but anyway, the important thing is that
there's lots of subtleties to timing code, and Tim Peters has already
thought of them so you don't have to :)
--
Steven
More information about the Tutor
mailing list