[SciPy-User] re[SciPy-user] moving for loops...

Sat May 22 08:59:50 EDT 2010

On Sat, May 22, 2010 at 6:21 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>
> Sounds like I am stuck with the loop as I need to do the comparison for each
> pixel of the world and then I have a basemap function call which I guess
> slows it down further...hmm

I don't see much that could be done differently, after a brief look.

stats.pearsonr could be replaced by an array version using directly
the formula for correlation even with nans. wilcoxon looks slow, and I
never tried or seen a faster version.

just a reminder, the p-values are for a single test, when you have
many of them, then they don't have the right size/confidence level for
an overall or joint test. (some packages report a Bonferroni
correction in this case)

Josef

>
> i.e.
>
> def compareSnowData(jules_var):
>    # Extract the 11 years of snow data and return
>    outrows = 180
>    outcols = 360
>    numyears = 11
>    nummonths = 12
>
>    # Read various files
>    fname="world_valid_jules_pts.ascii"
>    (numpts, land_pts_index, latitude, longitude, rows, cols) =
> jo.read_land_points_ascii(fname, 1.0)
>
>    fname = "globalSnowRun_1985_96.GSWP2.nsmax0.mon.gra"
>    jules_data1 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1, \
>                       timesteps=132, numvars=26)
>    fname = "globalSnowRun_1985_96.GSWP2.nsmax3.mon.gra"
>    jules_data2 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1, \
>                       timesteps=132, numvars=26)
>
>    # grab some space
>    data1_snow = np.zeros((nummonths * numyears, numpts), dtype=np.float32)
>    data2_snow = np.zeros((nummonths * numyears, numpts), dtype=np.float32)
>    pearsonsr_snow = np.ones((outrows, outcols), dtype=np.float32) * np.nan
>    wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) *
> np.nan
>
>    # extract the data
>    data1_snow = jules_data1[:,jules_var,:,0]
>    data2_snow = jules_data2[:,jules_var,:,0]
>    data1_snow = np.where(data1_snow < 0.0, np.nan, data1_snow)
>    data2_snow = np.where(data2_snow < 0.0, np.nan, data2_snow)
>    #for month in xrange(numyears * nummonths):
>    #    for i in xrange(numpts):
>    #        data1 = jules_data1[month,jules_var,land_pts_index[i],0]
>    #        data2 = jules_data2[month,jules_var,land_pts_index[i],0]
>    #        if data1 >= 0.0:
>    #            data1_snow[month,i] = data1
>    #        else:
>    #            data1_snow[month,i] = np.nan
>    #        if data2 > 0.0:
>    #            data2_snow[month,i] = data2
>    #        else:
>    #            data2_snow[month,i] = np.nan
>
>    # exclude any months from *both* arrays where we have dodgy data, else
> we
>    # can't do the correlations correctly!!
>    data1_snow = np.where(np.isnan(data2_snow), np.nan, data1_snow)
>    data2_snow = np.where(np.isnan(data1_snow), np.nan, data1_snow)
>
>    # put data on a regular grid...
>    print 'regridding landpts...'
>    for i in xrange(numpts):
>        # exclude the NaN, note masking them doesn't work in the stats func
>        x = data1_snow[:,i]
>        x = x[np.isfinite(x)]
>        y = data2_snow[:,i]
>        y = y[np.isfinite(y)]
>
>        # r^2
>        # exclude v.small arrays, i.e. we need just less over 4 years of
> data
>        if len(x) and len(y) > 50:
>            pearsonsr_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
> (stats.pearsonr(x, y)[0])**2
>
>        # wilcox signed rank test
>        # make sure we have enough samples to do the test
>        d = x - y
>        d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all non-zero
> differences
>        count = len(d)
>        if count > 10:
>            z, pval = stats.wilcoxon(x, y)
>            # only map out sign different data
>            if pval < 0.05:
>                wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
> np.mean(x - y)
>
>    return (pearsonsr_snow, wilcoxStats_snow)
>
>
> josef.pktd wrote:
>>
>> On Fri, May 21, 2010 at 10:14 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>
>>> Also I then need to remap the 2D array I make onto another grid (the
>>> world in
>>> this case). Which again I had am doing with a loop (note numpts is a lot
>>> bigger than my example above).
>>>
>>> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) * np.nan
>>> for i in xrange(numpts):
>>>        # exclude the NaN, note masking them doesn't work in the stats
>>> func
>>>        x = data1_snow[:,i]
>>>        x = x[np.isfinite(x)]
>>>        y = data2_snow[:,i]
>>>        y = y[np.isfinite(y)]
>>>
>>>        # wilcox signed rank test
>>>        # make sure we have enough samples to do the test
>>>        d = x - y
>>>        d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all non-zero
>>> differences
>>>        count = len(d)
>>>        if count > 10:
>>>            z, pval = stats.wilcoxon(x, y)
>>>            # only map out sign different data
>>>            if pval < 0.05:
>>>                wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>> np.mean(x - y)
>>>
>>> Now I think I can push the data in one move into the wilcoxStats_snow
>>> array
>>> by removing the index,
>>> but I can't see how I will get the individual x and y pts for each array
>>> member correctly without the loop, this was my attempt which of course
>>> doesn't work!
>>>
>>> x = data1_snow[:,:]
>>> x = x[np.isfinite(x)]
>>> y = data2_snow[:,:]
>>> y = y[np.isfinite(y)]
>>>
>>> # r^2
>>> # exclude v.small arrays, i.e. we need just less over 4 years of data
>>> if len(x) and len(y) > 50:
>>>    pearsonsr_snow[((180-1)-(rows-1)),cols-1] = (stats.pearsonr(x,
>>> y)[0])**2
>>
>>
>> If you want to do pairwise comparisons with stats.wilcoxon, then you
>> might be stuck with the loop, since wilcoxon takes only two 1d arrays
>> at a time (if I read the help correctly).
>>
>> Also the presence of nans might force the use a loop. stats.mstats has
>> masked array versions, but I didn't see wilcoxon in the list. (Even
>> when vectorized operations would work with regular arrays, nan or
>> masked array versions still have to loop in many cases.)
>>
>> If you have many columns with count <= 10, so that wilcoxon is not
>> calculated then it might be worth to use only array operations up to
>> that point. If wilcoxon is calculated most of the time, then it's not
>> worth thinking too hard about this.
>>
>> Josef
>>
>>
>>>
>>> thanks.
>>>
>>>
>>>
>>>
>>> mdekauwe wrote:
>>>>
>>>> Yes as Zachary said index is only 0 to 15237, so both methods work.
>>>>
>>>> I don't quite get what you mean about slicing with axis > 3. Is there a
>>>> link you can recommend I should read? Does that mean given I have 4dims
>>>> that Josef's suggestion would be more advised in this case?
>>
>> There were several discussions on the mailing lists (fancy slicing and
>> indexing). Your case is safe, but if you run in future into funny
>> shapes, you can look up the details.
>> when in doubt, I use np.arange(...)
>>
>> Josef
>>
>>>>
>>>> Thanks.
>>>>
>>>>
>>>>
>>>> josef.pktd wrote:
>>>>>
>>>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>>>
>>>>>> Thanks that works...
>>>>>>
>>>>>> So the way to do it is with np.arange(tsteps)[:,None], that was the
>>>>>> step
>>>>>> I
>>>>>> was struggling with, so this forms a 2D array which replaces the the
>>>>>> two
>>>>>> for
>>>>>> loops? Do I have that right?
>>>>>
>>>>> Yes, but as Zachary showed, if you need the full index in a dimension,
>>>>> then you can use slicing. It might be faster.
>>>>> And a warning, mixing slices and index arrays with 3 or more
>>>>> dimensions can have some surprise switching of axes.
>>>>>
>>>>> Josef
>>>>>
>>>>>>
>>>>>> A lot quicker...!
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>>
>>>>>> josef.pktd wrote:
>>>>>>>
>>>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I am trying to extract data from a 4D array and store it in a 2D
>>>>>>>> array,
>>>>>>>> but
>>>>>>>> avoid my current usage of the for loops for speed, as in reality the
>>>>>>>> arrays
>>>>>>>> sizes are quite big. Could someone also try and explain the solution
>>>>>>>> as
>>>>>>>> well
>>>>>>>> if they have a spare moment as I am still finding it quite difficult
>>>>>>>> to
>>>>>>>> get
>>>>>>>> over the habit of using loops (C convert for my sins). I get that
>>>>>>>> one
>>>>>>>> could
>>>>>>>> precompute the indices's i and j i.e.
>>>>>>>>
>>>>>>>> i = np.arange(tsteps)
>>>>>>>> j = np.arange(numpts)
>>>>>>>>
>>>>>>>> but just can't get my head round how i then use them...
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Martin
>>>>>>>>
>>>>>>>> import numpy as np
>>>>>>>>
>>>>>>>> numpts=10
>>>>>>>> tsteps = 12
>>>>>>>> vari = 22
>>>>>>>>
>>>>>>>> data = np.random.random((tsteps, vari, numpts, 1))
>>>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32)
>>>>>>>> index = np.arange(numpts)
>>>>>>>>
>>>>>>>> for i in xrange(tsteps):
>>>>>>>>    for j in xrange(numpts):
>>>>>>>>        new_data[i,j] = data[i,5,index[j],0]
>>>>>>>
>>>>>>> The index arrays need to be broadcastable against each other.
>>>>>>>
>>>>>>> I think this should do it
>>>>>>>
>>>>>>> new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts), 0]
>>>>>>>
>>>>>>> Josef
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> View this message in context:
>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html
>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> SciPy-User mailing list
>>>>>>>> SciPy-User at scipy.org
>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> SciPy-User mailing list
>>>>>>> SciPy-User at scipy.org
>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html
>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>
>>>>>> _______________________________________________
>>>>>> SciPy-User mailing list
>>>>>> SciPy-User at scipy.org
>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>
>>>>> _______________________________________________
>>>>> SciPy-User mailing list
>>>>> SciPy-User at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html
>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>
> --
> View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28642434.html
> Sent from the Scipy-User mailing list archive at Nabble.com.
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>