Numpy slow at vector cross product?
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Tue Nov 22 00:14:14 EST 2016
On Tuesday 22 November 2016 14:00, Steve D'Aprano wrote:
> Running a whole lot of loops can, sometimes, mitigate some of that
> variation, but not always. Even when running in a loop, you can easily get
> variation of 10% or more just at random.
I think that needs to be emphasised: there's a lot of random noise in these
measurements.
For big, heavyweight functions that do a lot of work, the noise is generally a
tiny proportion, and you can safely ignore it. (At least for CPU bound tasks:
I/O bound tasks, the noise in I/O is potentially very high.)
For really tiny operations, the noise *may* be small, depending on the
operation. But small is not insignificant. Consider a simple operation like
addition:
# Python 3.5
import statistics
from timeit import Timer
t = Timer("x + 1", setup="x = 0")
# ten trials, of one million loops each
results = t.repeat(repeat=10)
best = min(results)
average = statistics.mean(results)
std_error = statistics.stdev(results)/statistics.mean(results)
Best: 0.09761243686079979
Average: 0.0988507878035307
Std error: 0.02260956789268462
So this suggests that on my machine, doing no expensive virus scans or
streaming video, the random noise in something as simple as integer addition is
around two percent.
So that's your baseline: even simple operations repeated thousands of times
will show random noise of a few percent.
Consequently, if you're doing one trial (one loop of, say, a million
operations):
start = time.time()
for i in range(1000000):
x + 1
elapsed = time.time() - start
and compare the time taken with another trial, and the difference is of the
order of a few percentage points, then you have *no* reason to believe the
result is real. You ought to repeat your test multiple times -- the more the
better.
timeit makes it easy to repeat your tests. It automatically picks the best
timer for your platform and avoid serious gotchas from using the wrong timer.
When called from the command line, it will automatically select the best number
of loops to ensure reliable timing, without wasting time doing more loops than
needed.
timeit isn't magic. It's not doing anything that you or I couldn't do by hand,
if we knew we should be doing it, and if we could be bothered to run multiple
trials and gather statistics and keep a close eye on the deviation between
measurements. But who wants to do that by hand?
--
Steven
299792.458 km/s — not just a good idea, it’s the law!
More information about the Python-list
mailing list