[SciPy-User] re[SciPy-user] moving for loops...

Fri May 28 16:05:17 EDT 2010

On Fri, May 28, 2010 at 3:53 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>
> Ok thanks...I'll take a look.
>
> Back to my loops issue. What if instead this time I wanted to take an
> average so every march in 11 years, is there a quicker way to go about doing
> that than my current method?
>
> nummonths = 12
> numyears = 11
>
> for month in xrange(nummonths):
>    for i in xrange(numpts):
>        for ym in xrange(month, numyears * nummonths, nummonths):
>            data[month, i] += array[ym, VAR, land_pts_index[i], 0]


x[start:end:12,:] gives you every 12th row of an array x

something like this should work to get rid of the inner loop, or you
could directly put
range(month, numyears * nummonths, nummonths) into the array instead
of ym and sum()

Josef


>
> so for each point in the array for a given month i am jumping through and
> getting the next years month and so on, summing it.
>
> Thanks...
>
>
> josef.pktd wrote:
>>
>> On Wed, May 26, 2010 at 5:03 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>
>>> Could you possibly if you have time explain further your comment re the
>>> p-values, your suggesting I am misusing them?
>>
>> Depends on your use and interpretation
>>
>> test statistics, p-values are random variables, if you look at several
>> tests at the same time, some p-values will be large just by chance.
>> If, for example you just look at the largest test statistic, then the
>> distribution for the max of several test statistics is not the same as
>> the distribution for a single test statistic
>>
>> http://en.wikipedia.org/wiki/Multiple_comparisons
>> http://www.itl.nist.gov/div898/handbook/prc/section4/prc47.htm
>>
>> we also just had a related discussion for ANOVA post-hoc tests on the
>> pystatsmodels group.
>>
>> Josef
>>>
>>> Thanks.
>>>
>>>
>>> josef.pktd wrote:
>>>>
>>>> On Sat, May 22, 2010 at 6:21 AM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>>
>>>>> Sounds like I am stuck with the loop as I need to do the comparison for
>>>>> each
>>>>> pixel of the world and then I have a basemap function call which I
>>>>> guess
>>>>> slows it down further...hmm
>>>>
>>>> I don't see much that could be done differently, after a brief look.
>>>>
>>>> stats.pearsonr could be replaced by an array version using directly
>>>> the formula for correlation even with nans. wilcoxon looks slow, and I
>>>> never tried or seen a faster version.
>>>>
>>>> just a reminder, the p-values are for a single test, when you have
>>>> many of them, then they don't have the right size/confidence level for
>>>> an overall or joint test. (some packages report a Bonferroni
>>>> correction in this case)
>>>>
>>>> Josef
>>>>
>>>>
>>>>>
>>>>> i.e.
>>>>>
>>>>> def compareSnowData(jules_var):
>>>>>    # Extract the 11 years of snow data and return
>>>>>    outrows = 180
>>>>>    outcols = 360
>>>>>    numyears = 11
>>>>>    nummonths = 12
>>>>>
>>>>>    # Read various files
>>>>>    fname="world_valid_jules_pts.ascii"
>>>>>    (numpts, land_pts_index, latitude, longitude, rows, cols) =
>>>>> jo.read_land_points_ascii(fname, 1.0)
>>>>>
>>>>>    fname = "globalSnowRun_1985_96.GSWP2.nsmax0.mon.gra"
>>>>>    jules_data1 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1,
>>>>> \
>>>>>                       timesteps=132, numvars=26)
>>>>>    fname = "globalSnowRun_1985_96.GSWP2.nsmax3.mon.gra"
>>>>>    jules_data2 = jo.readJulesOutBinary(fname, numrows=15238, numcols=1,
>>>>> \
>>>>>                       timesteps=132, numvars=26)
>>>>>
>>>>>    # grab some space
>>>>>    data1_snow = np.zeros((nummonths * numyears, numpts),
>>>>> dtype=np.float32)
>>>>>    data2_snow = np.zeros((nummonths * numyears, numpts),
>>>>> dtype=np.float32)
>>>>>    pearsonsr_snow = np.ones((outrows, outcols), dtype=np.float32) *
>>>>> np.nan
>>>>>    wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) *
>>>>> np.nan
>>>>>
>>>>>    # extract the data
>>>>>    data1_snow = jules_data1[:,jules_var,:,0]
>>>>>    data2_snow = jules_data2[:,jules_var,:,0]
>>>>>    data1_snow = np.where(data1_snow < 0.0, np.nan, data1_snow)
>>>>>    data2_snow = np.where(data2_snow < 0.0, np.nan, data2_snow)
>>>>>    #for month in xrange(numyears * nummonths):
>>>>>    #    for i in xrange(numpts):
>>>>>    #        data1 = jules_data1[month,jules_var,land_pts_index[i],0]
>>>>>    #        data2 = jules_data2[month,jules_var,land_pts_index[i],0]
>>>>>    #        if data1 >= 0.0:
>>>>>    #            data1_snow[month,i] = data1
>>>>>    #        else:
>>>>>    #            data1_snow[month,i] = np.nan
>>>>>    #        if data2 > 0.0:
>>>>>    #            data2_snow[month,i] = data2
>>>>>    #        else:
>>>>>    #            data2_snow[month,i] = np.nan
>>>>>
>>>>>    # exclude any months from *both* arrays where we have dodgy data,
>>>>> else
>>>>> we
>>>>>    # can't do the correlations correctly!!
>>>>>    data1_snow = np.where(np.isnan(data2_snow), np.nan, data1_snow)
>>>>>    data2_snow = np.where(np.isnan(data1_snow), np.nan, data1_snow)
>>>>>
>>>>>    # put data on a regular grid...
>>>>>    print 'regridding landpts...'
>>>>>    for i in xrange(numpts):
>>>>>        # exclude the NaN, note masking them doesn't work in the stats
>>>>> func
>>>>>        x = data1_snow[:,i]
>>>>>        x = x[np.isfinite(x)]
>>>>>        y = data2_snow[:,i]
>>>>>        y = y[np.isfinite(y)]
>>>>>
>>>>>        # r^2
>>>>>        # exclude v.small arrays, i.e. we need just less over 4 years of
>>>>> data
>>>>>        if len(x) and len(y) > 50:
>>>>>            pearsonsr_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>>>> (stats.pearsonr(x, y)[0])**2
>>>>>
>>>>>        # wilcox signed rank test
>>>>>        # make sure we have enough samples to do the test
>>>>>        d = x - y
>>>>>        d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all
>>>>> non-zero
>>>>> differences
>>>>>        count = len(d)
>>>>>        if count > 10:
>>>>>            z, pval = stats.wilcoxon(x, y)
>>>>>            # only map out sign different data
>>>>>            if pval < 0.05:
>>>>>                wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>>>> np.mean(x - y)
>>>>>
>>>>>    return (pearsonsr_snow, wilcoxStats_snow)
>>>>>
>>>>>
>>>>> josef.pktd wrote:
>>>>>>
>>>>>> On Fri, May 21, 2010 at 10:14 PM, mdekauwe <mdekauwe at gmail.com> wrote:
>>>>>>>
>>>>>>> Also I then need to remap the 2D array I make onto another grid (the
>>>>>>> world in
>>>>>>> this case). Which again I had am doing with a loop (note numpts is a
>>>>>>> lot
>>>>>>> bigger than my example above).
>>>>>>>
>>>>>>> wilcoxStats_snow = np.ones((outrows, outcols), dtype=np.float32) *
>>>>>>> np.nan
>>>>>>> for i in xrange(numpts):
>>>>>>>        # exclude the NaN, note masking them doesn't work in the stats
>>>>>>> func
>>>>>>>        x = data1_snow[:,i]
>>>>>>>        x = x[np.isfinite(x)]
>>>>>>>        y = data2_snow[:,i]
>>>>>>>        y = y[np.isfinite(y)]
>>>>>>>
>>>>>>>        # wilcox signed rank test
>>>>>>>        # make sure we have enough samples to do the test
>>>>>>>        d = x - y
>>>>>>>        d = np.compress(np.not_equal(d,0), d ,axis=-1) # Keep all
>>>>>>> non-zero
>>>>>>> differences
>>>>>>>        count = len(d)
>>>>>>>        if count > 10:
>>>>>>>            z, pval = stats.wilcoxon(x, y)
>>>>>>>            # only map out sign different data
>>>>>>>            if pval < 0.05:
>>>>>>>                wilcoxStats_snow[((180-1)-(rows[i]-1)),cols[i]-1] =
>>>>>>> np.mean(x - y)
>>>>>>>
>>>>>>> Now I think I can push the data in one move into the wilcoxStats_snow
>>>>>>> array
>>>>>>> by removing the index,
>>>>>>> but I can't see how I will get the individual x and y pts for each
>>>>>>> array
>>>>>>> member correctly without the loop, this was my attempt which of
>>>>>>> course
>>>>>>> doesn't work!
>>>>>>>
>>>>>>> x = data1_snow[:,:]
>>>>>>> x = x[np.isfinite(x)]
>>>>>>> y = data2_snow[:,:]
>>>>>>> y = y[np.isfinite(y)]
>>>>>>>
>>>>>>> # r^2
>>>>>>> # exclude v.small arrays, i.e. we need just less over 4 years of data
>>>>>>> if len(x) and len(y) > 50:
>>>>>>>    pearsonsr_snow[((180-1)-(rows-1)),cols-1] = (stats.pearsonr(x,
>>>>>>> y)[0])**2
>>>>>>
>>>>>>
>>>>>> If you want to do pairwise comparisons with stats.wilcoxon, then you
>>>>>> might be stuck with the loop, since wilcoxon takes only two 1d arrays
>>>>>> at a time (if I read the help correctly).
>>>>>>
>>>>>> Also the presence of nans might force the use a loop. stats.mstats has
>>>>>> masked array versions, but I didn't see wilcoxon in the list. (Even
>>>>>> when vectorized operations would work with regular arrays, nan or
>>>>>> masked array versions still have to loop in many cases.)
>>>>>>
>>>>>> If you have many columns with count <= 10, so that wilcoxon is not
>>>>>> calculated then it might be worth to use only array operations up to
>>>>>> that point. If wilcoxon is calculated most of the time, then it's not
>>>>>> worth thinking too hard about this.
>>>>>>
>>>>>> Josef
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> thanks.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> mdekauwe wrote:
>>>>>>>>
>>>>>>>> Yes as Zachary said index is only 0 to 15237, so both methods work.
>>>>>>>>
>>>>>>>> I don't quite get what you mean about slicing with axis > 3. Is
>>>>>>>> there
>>>>>>>> a
>>>>>>>> link you can recommend I should read? Does that mean given I have
>>>>>>>> 4dims
>>>>>>>> that Josef's suggestion would be more advised in this case?
>>>>>>
>>>>>> There were several discussions on the mailing lists (fancy slicing and
>>>>>> indexing). Your case is safe, but if you run in future into funny
>>>>>> shapes, you can look up the details.
>>>>>> when in doubt, I use np.arange(...)
>>>>>>
>>>>>> Josef
>>>>>>
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> josef.pktd wrote:
>>>>>>>>>
>>>>>>>>> On Fri, May 21, 2010 at 10:55 AM, mdekauwe <mdekauwe at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Thanks that works...
>>>>>>>>>>
>>>>>>>>>> So the way to do it is with np.arange(tsteps)[:,None], that was
>>>>>>>>>> the
>>>>>>>>>> step
>>>>>>>>>> I
>>>>>>>>>> was struggling with, so this forms a 2D array which replaces the
>>>>>>>>>> the
>>>>>>>>>> two
>>>>>>>>>> for
>>>>>>>>>> loops? Do I have that right?
>>>>>>>>>
>>>>>>>>> Yes, but as Zachary showed, if you need the full index in a
>>>>>>>>> dimension,
>>>>>>>>> then you can use slicing. It might be faster.
>>>>>>>>> And a warning, mixing slices and index arrays with 3 or more
>>>>>>>>> dimensions can have some surprise switching of axes.
>>>>>>>>>
>>>>>>>>> Josef
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> A lot quicker...!
>>>>>>>>>>
>>>>>>>>>> Martin
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> josef.pktd wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Fri, May 21, 2010 at 8:59 AM, mdekauwe <mdekauwe at gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I am trying to extract data from a 4D array and store it in a 2D
>>>>>>>>>>>> array,
>>>>>>>>>>>> but
>>>>>>>>>>>> avoid my current usage of the for loops for speed, as in reality
>>>>>>>>>>>> the
>>>>>>>>>>>> arrays
>>>>>>>>>>>> sizes are quite big. Could someone also try and explain the
>>>>>>>>>>>> solution
>>>>>>>>>>>> as
>>>>>>>>>>>> well
>>>>>>>>>>>> if they have a spare moment as I am still finding it quite
>>>>>>>>>>>> difficult
>>>>>>>>>>>> to
>>>>>>>>>>>> get
>>>>>>>>>>>> over the habit of using loops (C convert for my sins). I get
>>>>>>>>>>>> that
>>>>>>>>>>>> one
>>>>>>>>>>>> could
>>>>>>>>>>>> precompute the indices's i and j i.e.
>>>>>>>>>>>>
>>>>>>>>>>>> i = np.arange(tsteps)
>>>>>>>>>>>> j = np.arange(numpts)
>>>>>>>>>>>>
>>>>>>>>>>>> but just can't get my head round how i then use them...
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Martin
>>>>>>>>>>>>
>>>>>>>>>>>> import numpy as np
>>>>>>>>>>>>
>>>>>>>>>>>> numpts=10
>>>>>>>>>>>> tsteps = 12
>>>>>>>>>>>> vari = 22
>>>>>>>>>>>>
>>>>>>>>>>>> data = np.random.random((tsteps, vari, numpts, 1))
>>>>>>>>>>>> new_data = np.zeros((tsteps, numpts), dtype=np.float32)
>>>>>>>>>>>> index = np.arange(numpts)
>>>>>>>>>>>>
>>>>>>>>>>>> for i in xrange(tsteps):
>>>>>>>>>>>>    for j in xrange(numpts):
>>>>>>>>>>>>        new_data[i,j] = data[i,5,index[j],0]
>>>>>>>>>>>
>>>>>>>>>>> The index arrays need to be broadcastable against each other.
>>>>>>>>>>>
>>>>>>>>>>> I think this should do it
>>>>>>>>>>>
>>>>>>>>>>> new_data = data[np.arange(tsteps)[:,None], 5, np.arange(numpts),
>>>>>>>>>>> 0]
>>>>>>>>>>>
>>>>>>>>>>> Josef
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> View this message in context:
>>>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28633477.html
>>>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> View this message in context:
>>>>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28634924.html
>>>>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> SciPy-User mailing list
>>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> SciPy-User mailing list
>>>>>>>>> SciPy-User at scipy.org
>>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28640656.html
>>>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> SciPy-User mailing list
>>>>>>> SciPy-User at scipy.org
>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>>
>>>>>> _______________________________________________
>>>>>> SciPy-User mailing list
>>>>>> SciPy-User at scipy.org
>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://old.nabble.com/removing-for-loops...-tp28633477p28642434.html
>>>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>>>
>>>>> _______________________________________________
>>>>> SciPy-User mailing list
>>>>> SciPy-User at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/removing-for-loops...-tp28633477p28686356.html
>>> Sent from the Scipy-User mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>
> --
> View this message in context: http://old.nabble.com/removing-for-loops...-tp28633477p28711249.html
> Sent from the Scipy-User mailing list archive at Nabble.com.
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>