[SciPy-User] re[SciPy-user] moving for loops...
josef.pktd at gmail.com
josef.pktd at gmail.com
Sat May 22 08:59:50 EDT 2010
On Sat, May 22, 2010 at 6:21 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>
> Sounds like I am stuck with the loop as I need to do the comparison for each
> pixel of the world and then I have a basemap function call which I guess
> slows it down further...hmm
I don't see much that could be done differently, after a brief look.
stats.pearsonr could be replaced by an array version using directly
the formula for correlation even with nans. wilcoxon looks slow, and I
never tried or seen a faster version.
just a reminder, the p-values are for a single test, when you have
many of them, then they don't have the right size/confidence level for
an overall or joint test. (some packages report a Bonferroni
correction in this case)
Josef
>
> i.e.
>
> def compareSnowData(jules_var):
> # Extract the 11 years of snow data and return
> outrows = 180
> outcols = 360
> numyears = 11
> nummonths = 12
>
> # Read various files
> fname="world_valid_jules_pts.ascii"
> (numpts, land_pts_index, latitude, longitude, rows, cols) =
> jo.read_land_points_ascii(fname, 1.0)
>
> fname = "globalSnowRun_1985_96.GSWP2.nsmax0.mon.gra"
> jules_data1 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1, \
> timesteps=132, numvars=26)
> fname = "globalSnowRun_1985_96.GSWP2.nsmax3.mon.gra"
> jules_data2 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1, \
> timesteps=132, numvars=26)
>
> # grab some space
> data1_snow = np.zeros((nummonths * numyears, numpts), dtype=np.float32)
> data2_snow = np.zeros((nummonths * numyears, numpts), dtype=np.float32)
> pearsonsr_snow = np.ones((outrows, outcols), dtype=np.float32) * np.nan
> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) *
> np.nan
>
> # extract the data
> data1_snow = jules_data1[:,jules_var,:,0]
> data2_snow = jules_data2[:,jules_var,:,0]
> data1_snow = np.where(data1_snow < 0.0, np.nan, data1_snow)
> data2_snow = np.where(data2_snow < 0.0, np.nan, data2_snow)
> #for month in xrange(numyears * nummonths):
> # for i in xrange(numpts):
> # data1 = jules_data1[month,jules_var,land_pts_index[i],0]
> # data2 = jules_data2[month,jules_var,land_pts_index[i],0]
> # if data1 >= 0.0:
> # data1_snow[month,i] = data1
> # else:
> # data1_snow[month,i] = np.nan
> # if data2 > 0.0:
> # data2_snow[month,i] = data2
> # else:
> # data2_snow[month,i] = np.nan
>
> # exclude any months from *both* arrays where we have dodgy data, else
> we
> # can't do the correlations correctly!!
> data1_snow = np.where(np.isnan(data2_snow), np.nan, data1_snow)
> data2_snow = np.where(np.isnan(data1_snow), np.nan, data1_snow)
>
> # put data on a regular grid...
> print 'regridding landpts...'
> for i in xrange(numpts):
> # exclude the NaN, note masking them doesn't work in the stats func
> x = data1_snow[:,i]
> x = x[np.isfinite(x)]
> y = data2_snow[:,i]
> y = y[np.isfinite(y)]
>
> # r^2
> # exclude v.small arrays, i.e. we need just less over 4 years of
> data
> if len(x) and len(y) > 50:
> pearsonsr_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
> (stats.pearsonr(x, y)[0])**2
>
> # wilcox signed rank test
> # make sure we have enough samples to do the test
> d = x - y
> d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all non-zero
> differences
> count = len(d)
> if count > 10:
> z, pval = stats.wilcoxon(x, y)
> # only map out sign different data
> if pval < 0.05:
> wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
> np.mean(x - y)
>
> return (pearsonsr_snow, wilcoxStats_snow)
>
>
> josef.pktd wrote:
>>
>> On Fri, May 21, 2010 at 10:14 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>
>>> Also I then need to remap the 2D array I make onto another grid (the
>>> world in
>>> this case). Which again I had am doing with a loop (note numpts is a lot
>>> bigger than my example above).
>>>
>>> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) * np.nan
>>> for i in xrange(numpts):
>>> # exclude the NaN, note masking them doesn't work in the stats
>>> func
>>> x = data1_snow[:,i]
>>> x = x[np.isfinite(x)]
>>> y = data2_snow[:,i]
>>> y = y[np.isfinite(y)]
>>>
>>> # wilcox signed rank test
>>> # make sure we have enough samples to do the test
>>> d = x - y
>>> d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all non-zero
>>> differences
>>> count = len(d)
>>> if count > 10:
>>> z, pval = stats.wilcoxon(x, y)
>>> # only map out sign different data
>>> if pval < 0.05:
>>> wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>> np.mean(x - y)
>>>
>>> Now I think I can push the data in one move into the wilcoxStats_snow
>>> array
>>> by removing the index,
>>> but I can't see how I will get the individual x and y pts for each array
>>> member correctly without the loop, this was my attempt which of course
>>> doesn't work!
>>>
>>> x = data1_snow[:,:]
>>> x = x[np.isfinite(x)]
>>> y = data2_snow[:,:]
>>> y = y[np.isfinite(y)]
>>>
>>> # r^2
>>> # exclude v.small arrays, i.e. we need just less over 4 years of data
>>> if len(x) and len(y) > 50:
>>> pearsonsr_snow[((180-1)-(rows-1)),cols-1] = (stats.pearsonr(x,
>>> y)[0])**2
>>
>>
>> If you want to do pairwise comparisons with stats.wilcoxon, then you
>> might be stuck with the loop, since wilcoxon takes only two 1d arrays
>> at a time (if I read the help correctly).
>>
>> Also the presence of nans might force the use a loop. stats.mstats has
>> masked array versions, but I didn't see wilcoxon in the list. (Even
>> when vectorized operations would work with regular arrays, nan or
>> masked array versions still have to loop in many cases.)
>>
>> If you have many columns with count <= 10, so that wilcoxon is not
>> calculated then it might be worth to use only array operations up to
>> that point. If wilcoxon is calculated most of the time, then it's not
>> worth thinking too hard about this.
>>
>> Josef
>>
>>
>>>
>>> thanks.
>>>
>>>
>>>
>>>
>>> mdekauwe wrote:
>>>>
>>>> Yes as Zachary said index is only 0 to 15237, so both methods work.
>>>>
>>>> I don't quite get what you mean about slicing with axis > 3. Is there a
>>>> link you can recommend I should read? Does that mean given I have 4dims
>>>> that Josef's suggestion would be more advised in this case?
>>
>> There were several discussions on the mailing lists (fancy slicing and
>> indexing). Your case is safe, but if you run in future into funny
>> shapes, you can look up the details.
>> when in doubt, I use np.arange(...)
>>
>> Josef
>>
>>>>
>>>> Thanks.
>>>>
>>>>
>>>>
>>>> josef.pktd wrote:
>>>>>
>>>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>>>
>>>>>> Thanks that works...
>>>>>>
>>>>>> So the way to do it is with np.arange(tsteps)[:,None], that was the
>>>>>> step
>>>>>> I
>>>>>> was struggling with, so this forms a 2D array which replaces the the
>>>>>> two
>>>>>> for
>>>>>> loops? Do I have that right?
>>>>>
>>>>> Yes, but as Zachary showed, if you need the full index in a dimension,
>>>>> then you can use slicing. It might be faster.
>>>>> And a warning, mixing slices and index arrays with 3 or more
>>>>> dimensions can have some surprise switching of axes.
>>>>>
>>>>> Josef
>>>>>
>>>>>>
>>>>>> A lot quicker...!
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>>
>>>>>> josef.pktd wrote:
>>>>>>>
>>>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I am trying to extract data from a 4D array and store it in a 2D
>>>>>>>> array,
>>>>>>>> but
>>>>>>>> avoid my current usage of the for loops for speed, as in reality the
>>>>>>>> arrays
>>>>>>>> sizes are quite big. Could someone also try and explain the solution
>>>>>>>> as
>>>>>>>> well
>>>>>>>> if they have a spare moment as I am still finding it quite difficult
>>>>>>>> to
>>>>>>>> get
>>>>>>>> over the habit of using loops (C convert for my sins). I get that
>>>>>>>> one
>>>>>>>> could
>>>>>>>> precompute the indices's i and j i.e.
>>>>>>>>
>>>>>>>> i = np.arange(tsteps)
>>>>>>>> j = np.arange(numpts)
>>>>>>>>
>>>>>>>> but just can't get my head round how i then use them...
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Martin
>>>>>>>>
>>>>>>>> import numpy as np
>>>>>>>>
>>>>>>>> numpts=10
>>>>>>>> tsteps = 12
>>>>>>>> vari = 22
>>>>>>>>
>>>>>>>> data = np.random.random((tsteps, vari, numpts, 1))
>>>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32)
>>>>>>>> index = np.arange(numpts)
>>>>>>>>
>>>>>>>> for i in xrange(tsteps):
>>>>>>>> for j in xrange(numpts):
>>>>>>>> new_data[i,j] = data[i,5,index[j],0]
>>>>>>>
>>>>>>> The index arrays need to be broadcastable against each other.
>>>>>>>
>>>>>>> I think this should do it
>>>>>>>
>>>>>>> new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts), 0]
>>>>>>>
>>>>>>> Josef
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> View this message in context:
>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html
>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> SciPy-User mailing list
>>>>>>>> SciPy-User at scipy.org
>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> SciPy-User mailing list
>>>>>>> SciPy-User at scipy.org
>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html
>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>
>>>>>> _______________________________________________
>>>>>> SciPy-User mailing list
>>>>>> SciPy-User at scipy.org
>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>
>>>>> _______________________________________________
>>>>> SciPy-User mailing list
>>>>> SciPy-User at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html
>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>
> --
> View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28642434.html
> Sent from the Scipy-User mailing list archive at Nabble.com.
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
More information about the SciPy-User
mailing list