fancy indexing/broadcasting question

A quick question for the group. I'm working with some code to generate some arrays of random numbers. The random numbers, however, need to meet certain criteria. So for the moment, I have things that look like this (code is just an abstraction): import numpy normal=numpy.random.normal RNDarray = normal(25,15,(50,50)) tmp1 = (RNDarray < 0) | (RNDarray > 25) while tmp1.any(): print tmp1.size, tmp1.shape, tmp1[tmp1].size RNDarray[tmp1] = normal(5,3, size = RNDarray[tmp1].size) tmp1 = (RNDarray < 0) | (RNDarray > 25) This code works well. However, it might be considered inefficient because, for each iteration of the while loop, all values get reevaluated even if they have previously met the criteria encapsulated in tmp1. It would be better if, for each cycle of the while loop, only those elements that have previously not met criteria get reevaluated. I tried a few things that haven't worked. An example is provided below (changed code marked with *): RNDarray = normal(25,15,(50,50)) tmp1 = (RNDarray < 0) | (RNDarray > 25) while tmp1.any(): print tmp1.size, tmp1.shape, tmp1[tmp1].size RNDarray[tmp1] = normal(5,3, size = RNDarray[tmp1].size) * tmp1 = (RNDarray[tmp1] < 0) | (RNDarray[tmp1] > 25) I see why this doesn't work properly. When tmp1 is reevaluated in the while loop, it returns an array with a different shape. Does anyone have any suggestions for improvement here? Please let me know if any of this requires clarification. Thanks, -Mark

Sorry...here's a minor correction to the code. #1st part import numpy normal=numpy.random.normal RNDarray = normal(25,15,(50,50)) tmp1 = (RNDarray < 0) | (RNDarray > 25) while tmp1.any(): print tmp1.size, tmp1.shape, tmp1[tmp1].size RNDarray[tmp1] = normal(25,15, size = RNDarray[tmp1].size) tmp1 = (RNDarray < 0) | (RNDarray > 25) #2nd part import numpy normal=numpy.random.normal RNDarray = normal(25,15,(50,50)) tmp1 = (RNDarray < 0) | (RNDarray > 25) while tmp1.any(): print tmp1.size, tmp1.shape, tmp1[tmp1].size RNDarray[tmp1] = normal(25,15, size = RNDarray[tmp1].size) * tmp1 = (RNDarray[tmp1] < 0) | (RNDarray[tmp1] > 25) Mark.Miller wrote:
A quick question for the group. I'm working with some code to generate some arrays of random numbers. The random numbers, however, need to meet certain criteria. So for the moment, I have things that look like this (code is just an abstraction):
import numpy normal=numpy.random.normal
RNDarray = normal(25,15,(50,50)) tmp1 = (RNDarray < 0) | (RNDarray > 25) while tmp1.any(): print tmp1.size, tmp1.shape, tmp1[tmp1].size RNDarray[tmp1] = normal(5,3, size = RNDarray[tmp1].size) tmp1 = (RNDarray < 0) | (RNDarray > 25)
This code works well. However, it might be considered inefficient because, for each iteration of the while loop, all values get reevaluated even if they have previously met the criteria encapsulated in tmp1. It would be better if, for each cycle of the while loop, only those elements that have previously not met criteria get reevaluated.
I tried a few things that haven't worked. An example is provided below (changed code marked with *):
RNDarray = normal(25,15,(50,50)) tmp1 = (RNDarray < 0) | (RNDarray > 25) while tmp1.any(): print tmp1.size, tmp1.shape, tmp1[tmp1].size RNDarray[tmp1] = normal(5,3, size = RNDarray[tmp1].size) * tmp1 = (RNDarray[tmp1] < 0) | (RNDarray[tmp1] > 25)
I see why this doesn't work properly. When tmp1 is reevaluated in the while loop, it returns an array with a different shape.
Does anyone have any suggestions for improvement here? Please let me know if any of this requires clarification.
Thanks,
-Mark _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
-- Dr. Mark P. Miller Department of Biology 5305 Old Main Hill Utah State University Logan, UT 84322-5305 USA
<><><><><><><><><><><><><><><>< http://www.biology.usu.edu/people/facultyinfo.asp?username=mpmbio http://www.marksgeneticsoftware.net

Mark.Miller wrote:
Sorry...here's a minor correction to the code.
#1st part import numpy normal=numpy.random.normal
RNDarray = normal(25,15,(50,50)) tmp1 = (RNDarray < 0) | (RNDarray > 25) while tmp1.any(): print tmp1.size, tmp1.shape, tmp1[tmp1].size RNDarray[tmp1] = normal(25,15, size = RNDarray[tmp1].size) tmp1 = (RNDarray < 0) | (RNDarray > 25)
#2nd part import numpy normal=numpy.random.normal
RNDarray = normal(25,15,(50,50)) tmp1 = (RNDarray < 0) | (RNDarray > 25) while tmp1.any(): print tmp1.size, tmp1.shape, tmp1[tmp1].size RNDarray[tmp1] = normal(25,15, size = RNDarray[tmp1].size) * tmp1 = (RNDarray[tmp1] < 0) | (RNDarray[tmp1] > 25)
The reason is that tmp1 is no longer a mask into RNDarray, but into RNDarray[tmp1] (the old tmp1). For something as small as (50, 50) and simple criteria (no looping), the first version will probably be faster than any attempt to optimize it. However, if you do have larger arrays or slower criteria, you can reduce the size of the re-evaluated array pretty simply. I'm still not sure it will be faster, but here it is: import numpy normal = numpy.random.normal RNDarray = normal(25, 15, (50, 50)) badmask = (RNDarray < 0) | (RNDarray > 25) nbad = badmask.sum() while nbad > 0: new = normal(25, 15, size=nbad) RNDarray[badmask] = new newbad = (new < 0) | (new > 25) badmask[badmask] = newbad nbad = newbad.sum() -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

That seems to do the trick. Runtimes are reduced by 15-20% in my code. Robert Kern wrote:
The reason is that tmp1 is no longer a mask into RNDarray, but into RNDarray[tmp1] (the old tmp1). For something as small as (50, 50) and simple criteria (no looping), the first version will probably be faster than any attempt to optimize it.
However, if you do have larger arrays or slower criteria, you can reduce the size of the re-evaluated array pretty simply. I'm still not sure it will be faster, but here it is:
import numpy normal = numpy.random.normal
RNDarray = normal(25, 15, (50, 50)) badmask = (RNDarray < 0) | (RNDarray > 25) nbad = badmask.sum() while nbad > 0: new = normal(25, 15, size=nbad) RNDarray[badmask] = new newbad = (new < 0) | (new > 25) badmask[badmask] = newbad nbad = newbad.sum()

On 07/07/07, Mark.Miller <mpmusu@cc.usu.edu> wrote:
A quick question for the group. I'm working with some code to generate some arrays of random numbers. The random numbers, however, need to meet certain criteria. So for the moment, I have things that look like this (code is just an abstraction):
import numpy normal=numpy.random.normal
RNDarray = normal(25,15,(50,50)) tmp1 = (RNDarray < 0) | (RNDarray > 25) while tmp1.any(): print tmp1.size, tmp1.shape, tmp1[tmp1].size RNDarray[tmp1] = normal(5,3, size = RNDarray[tmp1].size) tmp1 = (RNDarray < 0) | (RNDarray > 25)
This code works well. However, it might be considered inefficient because, for each iteration of the while loop, all values get reevaluated even if they have previously met the criteria encapsulated in tmp1. It would be better if, for each cycle of the while loop, only those elements that have previously not met criteria get reevaluated.
You can write a quick recursive function to do it: In [33]: def gen_condition(n): ....: A = normal(size=n) ....: c = abs(A)>1 ....: subn = sum(c) ....: if subn>0: ....: A[c] = gen_condition(subn) ....: return A ....: Probably not ideal if it's going to take many tries, but it's pretty clear. Anne
participants (3)
-
Anne Archibald
-
Mark.Miller
-
Robert Kern