[Numpy-discussion] Getting the indexes of the myarray.min()

Bruce Southey southey at uiuc.edu
Thu May 13 14:22:02 EDT 2004


Hi,
Raymond D. Hettinger is writing a general statistics module 'statistics.py  A
collection of functions for summarizing data' that is somewhere in a Python
CVS (I can not find the exact reference but it appeared in a fairly recent
Python thread). He uses a one-pass algorithm from Knuth for the variance that
has good numerical stability.

Below is a rather rough version modified from my situation (masked arrays) which
uses Knuth's algorithm for the variance. It lacks features like checking
dimensions (assumes variance can be computed) and documentation.

Regards
Bruce Southey

import numarray

def SummaryStats(Matrix):
        mshape=Matrix.getshape()
        nrows=mshape[0]
        ncols=mshape[1]
        #print nrows, ncols
        # Create matrices to hold statistics
        N_obs =numarray.zeros(ncols, type='Float64')
        Sum   =numarray.zeros(ncols, type='Float64')
        Var   =numarray.zeros(ncols, type='Float64')
        Min   =numarray.zeros(ncols, type='Float64')
        Max   =numarray.zeros(ncols, type='Float64')
        Mean  =numarray.zeros(ncols, type='Float64')
        AdjM  =numarray.zeros(ncols, type='Float64')
        NewM  =numarray.zeros(ncols, type='Float64')
        DifM  =numarray.zeros(ncols, type='Float64')

        for row in range(nrows):
                for col in range(ncols):
                        t_value=Matrix[row,col]
                        N_obs[col] = N_obs[col] + 1
                        Sum[col] = Sum[col] + t_value
                        if t_value > Max[col]:
                                Max[col]=t_value
                        if t_value < Min[col]:
                                Min[col]=t_value
                        if N_obs[col]==1:
                                Mean[col]=t_value
                        AdjM[col]=(t_value-Mean[col])/(N_obs[col])-DifM[col]
                        NewM[col]=Mean[col]+AdjM[col]
                        DifM[col]=(NewM[col]-Mean[col])-AdjM[col]
                        Var[col] = Var[col] +
(t_value-Mean[col])*(t_value-NewM[col])
                        Mean[col] = NewM[col]
        print 'N_obs\n', N_obs
        print 'Sum\n', Sum
        print 'Mean\n', Mean
        print 'Var\n', Var/(nrows-1)

if __name__ == '__main__':
        MValues=numarray.array([[1,2,1],[3,2,2],[5,1,1],[4,3,2]])
        SummaryStats(MValues)



---- Original message ----
>Date: Thu, 13 May 2004 15:42:30 -0400
>From: "Perry Greenfield" <perry at stsci.edu>  
>Subject: RE: [Numpy-discussion] Getting the indexes of the myarray.min()  
>To: "Russell E Owen" <rowen at u.washington.edu>, "numarray"
<numpy-discussion at lists.sourceforge.net>
>
>> Russell E Owen wrote:
>> 
>> At 9:27 AM -0400 2004-05-13, Perry Greenfield wrote:
>> >... One has to trade off the number of such functions
>> >against the speed savings. Another example is getting max and min values
>> >for an array. I've long thought that this is so often done they could
>> >be done in one pass. There isn't a function that does this yet though.
>> 
>> Statistics is another area where multiple return values could be of 
>> interest -- one may want the mean and std dev, and making two passes 
>> is wasteful (since some of the same info needs to be computed both 
>> times).
>> 
>> A do-all function that computes min, min location, max, max location, 
>> mean and std dev all at once would be nice (especially if the 
>> returned values were accessed by name, rather than just being a tuple 
>> of values, so they could be referenced safely and readably).
>> 
>> -- Russell
>> 
>We will definitely add something like this for 1.0 or 1.1.
>(but probably for min and max location, it will just be
>for the first encountered).
>
>Perry
>
>
>-------------------------------------------------------
>This SF.Net email is sponsored by: SourceForge.net Broadband
>Sign-up now for SourceForge Broadband and get the fastest
>6.0/768 connection for only $19.95/mo for the first 3 months!
>http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click
>_______________________________________________
>Numpy-discussion mailing list
>Numpy-discussion at lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/numpy-discussion




More information about the NumPy-Discussion mailing list