[Numpy-discussion] A random.normal function with stdev as array]

Tim Hochberg tim.hochberg at cox.net
Wed Apr 5 09:58:08 EDT 2006


Eric Emsellem wrote:

> Hi,
> this is illuminating in fact. These are things I would not have 
> thought about.
>
> I am trying at the moment to understand why two versions of my program 
> have a difference of about 10% (here it is 2sec for 1000 points, so 
> you can imagine for 10^6...) although the code is nearly the same.
>
> I have loops such as:
>
> ####################
> bigarray = array of Nbig points
> for i in range(N) :
>  bigarray = bigarray + calculation
> ####################


If you tell us more about calculation, we could probably help more. This 
sounds like you want to vectorize the inner loop, but you may in fact 
have already done that. There's nothing wrong with looping in python as 
long as you amortize the loop overhead over a large number of 
operations. Thus, the advice to vectorize your *inner* loop, not 
vectorize all loops. Attempting the latter can lead to impenatrable 
code, usually doesn't help signifigantly and sometimes slows things down 
as you overflow the cache with big matrices.

>
> I thought to do it by:
> ####################
> bigarray = numpy.sum(array([calculation for i in range(N)]))
> ####################
> not sure this is good...

I suspect not, but timeit is your friend....

>
> And you are basically saying that
>
>  bigarray = bigarray + calculation
>
> is better than
>
>  bigarray +=  calculation
>
> or is it strictly equivalent? (in terms of CPU...)

Actually the reverse. "bigarray += calculation" should be better in 
terms of both speed and memory usage. In this case it's also clearer, so 
it's an improvement all around. They both do the same number of adds, 
but the first allocates more memory and pushes more data back and forth 
between main memory and the cache.

The point I was making about += verus + was that  I wouldn't in general 
recommend:

   a = some_func()
   a += something_else

over:

   a = some_func() + something_else

because it's less clear. In cases, where you do need really need the 
speed, it's fine, but most of the time that's not the case. In your 
case, the speedup is fairly minor, I believe because random.normal is 
fairly expensive. If you instead compare these two ways of computing a 
cube, you'll see a much larger difference (37%).

>>> setup = "import numpy; stddev=numpy.arange(1e6,dtype=float)%3"
>>> timeit.Timer('stddev * stddev * stddev', setup).timeit(20)
1.206557537340359
>>> timeit.Timer('result = stddev*stddev; result *= stddev', 
setup).timeit(20)
0.88055493086403658

However, if you work with smaller matrices, the effect almost disappears 
(5%):

>>> setup = "import numpy; stddev=numpy.arange(1e4,dtype=float)%3"
>>> timeit.Timer('result = stddev*stddev; result *= stddev', setup).time
0.10166515576702295
>>> timeit.Timer('stddev * stddev * stddev', setup).timeit(2000)
0.10613667379493563

I believe that's because the speedup is nearly all due to reducing the 
amount of data you move around. In the second case everything fits in 
the cache, so this effect is minor. In the first you are pushing data 
back and forth to main memory so it's fairly large.  On my machine these 
sort of effects kick in somewhere between 10,000 and 100,000 elements.


>
> thanks for the help, and sorry for the dum questions


Not a problem. These are all legitimate questions that you can't really 
be expected to know without a fair amount of experience with numpy or 
its predecessors. It would be cool if someone added a page to the wicki 
on the topic so we could start collecting and orgainizing this 
information. For all I know there's  one already there though -- I 
should probably check.

-tim

>
> Eric
>
> Tim Hochberg wrote:
>
>> Eric Emsellem wrote:
>>
>>>
>>>>
>>>>
>>>> Since stdev essentially scales the output of random, wouldn't the 
>>>> followin be equivalent to the above?
>>>>
>>>> result = numpy.random.normal(0, 1, stddev.shape)
>>>> result *= stdev
>>>>
>>> yes indeed, this is a good option where in fact I could do
>>>
>>> result = stddev * numpy.random.normal(0, 1, stddev.shape)
>>>
>>> in one line.
>>> thanks for the tip 
>>
>>
>> Indeed you can. However, keep in mind that the one line version is 
>> equivalent to:
>>
>>    temp = numpy.random.normal(0, 1, stddev.shape)
>>    result = stddev * temp
>>
>> That is, it creates an extra temporary variable only to throw it 
>> away. The two line version I posted above avoids that temporary and 
>> thus should be both faster and  less memory hungry. It's always good 
>> to check these things however:
>>
>> >>> setup = "import numpy; stddev=numpy.arange(1e6,dtype=float)%3"
>> >>> timeit.Timer('stddev * numpy.random.normal(0, 1, stddev.shape)', 
>> setup).timeit(20)
>> 3.4527201082819232
>> >>> timeit.Timer('result = numpy.random.normal(0, 1, stddev.shape); 
>> result*=stddev', setup).timeit(20)
>> 3.1093337281693607
>>
>> So, yes, the two line method is marginally faster (about 10%). Most 
>> of the time you shouldn't care about this: the one line version is 
>> clearer and most of the code you write isn't a bottleneck. Starting 
>> out writing this as the two line version is premature optimization. I 
>> used it here since the question was about optimization .
>>
>> I see Robert Kern just posted my list. If you want to put this in 
>> terms of that list, then:
>>
>> 0. Think about your algorithm
>>    => Recognize that stddev is a scale parameter
>> 1. Vectorize your inner loop.
>>    => This is a no brainer after 0 resulting in the one line version
>> 2. Eliminate temporaries
>>    => This results in the two line version.
>> ...
>>
>> Also key here is recognizing when to stop. Steps 0 is always 
>> appropriate and step 1 is almost always good, resulting in code that 
>> is both clearer and faster. However, once you get to step 2 and 
>> beyond you tend to trade speed/memory usage for clarity. Not always: 
>> sometime *= and friends are clearer, but often, particularly if you 
>> start resorting to three arg ufuncs. So, my advice is to stop 
>> optimizing as soon as your code is fast enough.
>>
>>
>>> (of course this is not strictly equivalent depending on the random 
>>> generator, but that will be fine for my purpose)
>>
>>
>> I'll have to take your word for it -- after the normal distribution 
>> my knowledge in the area peters out rapidly/
>>
>> Regards,
>>
>> -tim
>>
>>
>







More information about the NumPy-Discussion mailing list