[Numpy-discussion] Adding a 2D with a 1D array...

Thu Sep 10 07:45:10 EDT 2009

Francesc Alted wrote:
> A Wednesday 09 September 2009 20:17:20 Dag Sverre Seljebotn escrigué:
> 
>  > Ruben Salvador wrote:
> 
>  > > Your results are what I expected...but. This code is called from my 
> main
> 
>  > > program, and what I have in there (output array already created for 
> both
> 
>  > > cases) is:
> 
>  > >
> 
>  > > print "lambd", lambd
> 
>  > > print "np.shape(a)", np.shape(a)
> 
>  > > print "np.shape(r)", np.shape(r)
> 
>  > > print "np.shape(offspr)", np.shape(offspr)
> 
>  > > t = clock()
> 
>  > > for i in range(lambd):
> 
>  > > offspr[i] = r[i] + a[i]
> 
>  > > t1 = clock() - t
> 
>  > > print "For loop time ==> %.8f seconds" % t1
> 
>  > > t2 = clock()
> 
>  > > offspr = r + a[:,None]
> 
>  > > t3 = clock() - t2
> 
>  > > print "Pythonic time ==> %.8f seconds" % t3
> 
>  > >
> 
>  > > The results I obtain are:
> 
>  > >
> 
>  > > lambd 80000
> 
>  > > np.shape(a) (80000,)
> 
>  > > np.shape(r) (80000, 26)
> 
>  > > np.shape(offspr) (80000, 26)
> 
>  > > For loop time ==> 0.34528804 seconds
> 
>  > > Pythonic time ==> 0.35956192 seconds
> 
>  > >
> 
>  > > Maybe I'm not measuring properly, so, how should I do it?
> 
>  >
> 
>  > Like Luca said, you are not including the creation time of offspr in the
> 
>  > for-loop version. A fairer comparison would be
> 
>  >
> 
>  > offspr[...] = r + a[:, None]
> 
>  >
> 
>  > Even fairer (one less temporary copy):
> 
>  >
> 
>  > offspr[...] = r
> 
>  > offspr += a[:, None]
> 
>  >
> 
>  > Of course, see how the trend is for larger N as well.
> 
>  >
> 
>  > Also your timings are a bit crude (though this depends on how many times
> 
>  > you ran your script to check :-)). To get better measurements, use the
> 
>  > timeit module, or (easier) IPython and the %timeit command.
> 
> Oh well, the art of benchmarking :)
> 
> The timeit module allows you normally get less jitter in timings because 
> it loops on doing the same operation repeatedly and get a mean. However, 
> this has the drawback of filling your cache with the datasets (or part 
> of them) so, in the end, your measurements with timeit does not take 
> into account the time to transmit the data in main memory into the CPU 
> caches, and that may be not what you want to measure.

Do you see any issues with this approach: Add a flag timeit to provide 
two modes:

a) Do an initial run which is always not included in timings (in fact, 
as it gets "min" and not "mean", I think this is the current behaviour)

b) Do something else between every run which should clear out the cache 
(like, just do another big dummy calculation).

(Also a guard in timeit against CPU frequency scaling errors would be 
great :-) Like simply outputting a warning if frequency scaling is 
detected).

-- 
Dag Sverre