[Numpy-discussion] Data standardizing
wesmckinn at gmail.com
Wed Apr 13 21:39:49 EDT 2011
On Wed, Apr 13, 2011 at 9:50 AM, Jonathan Rocher <jrocher at enthought.com> wrote:
> I assume you have this data in a txt file, correct? You can load up all of
> it in a numpy array using
> import numpy as np
> data = np.loadtxt("climat_file.txt", skiprows = 1)
> Then you can compute the mean you want by taking it on a slice of the data
> array. For example, if you want to compute the mean of your data in Jan for
> 1950-1970 (say including 1970)
> mean1950_1970 = data[1950:1971,1].mean()
> Then the std deviation you want could be computed using
> my_std = np.sqrt(np.mean((data[:,1]-mean1950_1970)**2))
> Hope this helps,
> On Tue, Apr 12, 2011 at 1:48 PM, Climate Research <climateforu at gmail.com>
>> I am purely new to python and numpy.. I am using python for doing
>> statistical calculations to Climate data..
>> I have a data set in the following format..
>> Year Jan feb Mar Apr................. Dec
>> 1900 1000 1001 , , ,
>> 1901 1011 1012 , , ,
>> 1902 1009 1007 , ,
>> ,,,, , ' , , ,
>> ,,,, , ,
>> 2010 1008 1002 , , ,
>> I actually want to standardize each of these values with corresponding
>> standard deviations for each monthly data column..
>> I have found out the standard deviations for each column.. but now i need
>> to find the standared deviation only for a prescribed mean value
>> ie, when i am finding the standared deviation for the January data
>> column.. the mean should be calculated only for the january data, say from
>> 1950-1970. With this mean i want to calculate the SD for entire column.
>> Any help will be appreciated..
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
> Jonathan Rocher, PhD
> Scientific software developer
> Enthought, Inc.
> jrocher at enthought.com
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
To standardize the data over each column you'll want to do:
(data - data.mean(axis=0)) / data.std(axis=0, ddof=1)
Note the broadcasting behavior of the (matrix - vector) operation--see
NumPy documentation for more details. The ddof=1 is there to give you
the (unbiased) sample standard deviation.
If you're looking for data structures to carry around your metadata
(dates and month labels), look to pandas (my project:
http://pandas.sourceforge.net/) or larry
More information about the NumPy-Discussion