Data standardizing
Hi I am purely new to python and numpy.. I am using python for doing statistical calculations to Climate data.. I have a data set in the following format.. Year Jan feb Mar Apr................. Dec 1900 1000 1001 , , , 1901 1011 1012 , , , 1902 1009 1007 , , ,,,, , ' , , , ,,,, , , 2010 1008 1002 , , , I actually want to standardize each of these values with corresponding standard deviations for each monthly data column.. I have found out the standard deviations for each column.. but now i need to find the standared deviation only for a prescribed mean value ie, when i am finding the standared deviation for the January data column.. the mean should be calculated only for the january data, say from 19501970. With this mean i want to calculate the SD for entire column. Any help will be appreciated..
Hi, I assume you have this data in a txt file, correct? You can load up all of it in a numpy array using import numpy as np data = np.loadtxt("climat_file.txt", skiprows = 1) Then you can compute the mean you want by taking it on a slice of the data array. For example, if you want to compute the mean of your data in Jan for 19501970 (say including 1970) mean1950_1970 = data[1950:1971,1].mean() Then the std deviation you want could be computed using my_std = np.sqrt(np.mean((data[:,1]mean1950_1970)**2)) Hope this helps, Jonathan On Tue, Apr 12, 2011 at 1:48 PM, Climate Research <climateforu@gmail.com>wrote:
Hi I am purely new to python and numpy.. I am using python for doing statistical calculations to Climate data..
I have a data set in the following format..
Year Jan feb Mar Apr................. Dec 1900 1000 1001 , , , 1901 1011 1012 , , , 1902 1009 1007 , , ,,,, , ' , , , ,,,, , , 2010 1008 1002 , , ,
I actually want to standardize each of these values with corresponding standard deviations for each monthly data column.. I have found out the standard deviations for each column.. but now i need to find the standared deviation only for a prescribed mean value ie, when i am finding the standared deviation for the January data column.. the mean should be calculated only for the january data, say from 19501970. With this mean i want to calculate the SD for entire column. Any help will be appreciated..
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
 Jonathan Rocher, PhD Scientific software developer Enthought, Inc. jrocher@enthought.com 15125361057 http://www.enthought.com
On Wed, Apr 13, 2011 at 9:50 AM, Jonathan Rocher <jrocher@enthought.com> wrote:
Hi,
I assume you have this data in a txt file, correct? You can load up all of it in a numpy array using import numpy as np data = np.loadtxt("climat_file.txt", skiprows = 1)
Then you can compute the mean you want by taking it on a slice of the data array. For example, if you want to compute the mean of your data in Jan for 19501970 (say including 1970) mean1950_1970 = data[1950:1971,1].mean()
Then the std deviation you want could be computed using my_std = np.sqrt(np.mean((data[:,1]mean1950_1970)**2))
Hope this helps, Jonathan
On Tue, Apr 12, 2011 at 1:48 PM, Climate Research <climateforu@gmail.com> wrote:
Hi I am purely new to python and numpy.. I am using python for doing statistical calculations to Climate data..
I have a data set in the following format..
Year Jan feb Mar Apr................. Dec 1900 1000 1001 , , , 1901 1011 1012 , , , 1902 1009 1007 , , ,,,, , ' , , , ,,,, , , 2010 1008 1002 , , ,
I actually want to standardize each of these values with corresponding standard deviations for each monthly data column.. I have found out the standard deviations for each column.. but now i need to find the standared deviation only for a prescribed mean value ie, when i am finding the standared deviation for the January data column.. the mean should be calculated only for the january data, say from 19501970. With this mean i want to calculate the SD for entire column. Any help will be appreciated..
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
 Jonathan Rocher, PhD Scientific software developer Enthought, Inc. jrocher@enthought.com 15125361057 http://www.enthought.com
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
To standardize the data over each column you'll want to do: (data  data.mean(axis=0)) / data.std(axis=0, ddof=1) Note the broadcasting behavior of the (matrix  vector) operationsee NumPy documentation for more details. The ddof=1 is there to give you the (unbiased) sample standard deviation. <shameless plug> If you're looking for data structures to carry around your metadata (dates and month labels), look to pandas (my project: http://pandas.sourceforge.net/) or larry (http://larry.sourceforge.net/). </shameless plug>  Wes
On Apr 12, 2011, at 8:48 PM, Climate Research wrote:
Hi I am purely new to python and numpy.. I am using python for doing statistical calculations to Climate data..
Check the scikits.timeseries and scikits.hydroclimpy as well, they have routines for that very purpose.
participants (4)

Climate Research

Jonathan Rocher

Pierre GM

Wes McKinney