Data standardizing
Hi I am purely new to python and numpy.. I am using python for doing statistical calculations to Climate data.. I have a data set in the following format.. Year Jan feb Mar Apr................. Dec 1900 1000 1001 , , , 1901 1011 1012 , , , 1902 1009 1007 , , ,,,, , ' , , , ,,,, , , 2010 1008 1002 , , , I actually want to standardize each of these values with corresponding standard deviations for each monthly data column.. I have found out the standard deviations for each column.. but now i need to find the standared deviation only for a prescribed mean value ie, when i am finding the standared deviation for the January data column.. the mean should be calculated only for the january data, say from 19501970. With this mean i want to calculate the SD for entire column. Any help will be appreciated..
Hi, I assume you have this data in a txt file, correct? You can load up all of it in a numpy array using import numpy as np data = np.loadtxt("climat_file.txt", skiprows = 1) Then you can compute the mean you want by taking it on a slice of the data array. For example, if you want to compute the mean of your data in Jan for 19501970 (say including 1970) mean1950_1970 = data[1950:1971,1].mean() Then the std deviation you want could be computed using my_std = np.sqrt(np.mean((data[:,1]mean1950_1970)**2)) Hope this helps, Jonathan On Tue, Apr 12, 2011 at 1:48 PM, Climate Research <climateforu@gmail.com>wrote:
Hi I am purely new to python and numpy.. I am using python for doing statistical calculations to Climate data..
I have a data set in the following format..
Year Jan feb Mar Apr................. Dec 1900 1000 1001 , , , 1901 1011 1012 , , , 1902 1009 1007 , , ,,,, , ' , , , ,,,, , , 2010 1008 1002 , , ,
I actually want to standardize each of these values with corresponding standard deviations for each monthly data column.. I have found out the standard deviations for each column.. but now i need to find the standared deviation only for a prescribed mean value ie, when i am finding the standared deviation for the January data column.. the mean should be calculated only for the january data, say from 19501970. With this mean i want to calculate the SD for entire column. Any help will be appreciated..
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
 Jonathan Rocher, PhD Scientific software developer Enthought, Inc. jrocher@enthought.com 15125361057 http://www.enthought.com
On Wed, Apr 13, 2011 at 9:50 AM, Jonathan Rocher <jrocher@enthought.com> wrote:
Hi,
I assume you have this data in a txt file, correct? You can load up all of it in a numpy array using import numpy as np data = np.loadtxt("climat_file.txt", skiprows = 1)
Then you can compute the mean you want by taking it on a slice of the data array. For example, if you want to compute the mean of your data in Jan for 19501970 (say including 1970) mean1950_1970 = data[1950:1971,1].mean()
Then the std deviation you want could be computed using my_std = np.sqrt(np.mean((data[:,1]mean1950_1970)**2))
Hope this helps, Jonathan
On Tue, Apr 12, 2011 at 1:48 PM, Climate Research <climateforu@gmail.com> wrote:
Hi I am purely new to python and numpy.. I am using python for doing statistical calculations to Climate data..
I have a data set in the following format..
Year Jan feb Mar Apr................. Dec 1900 1000 1001 , , , 1901 1011 1012 , , , 1902 1009 1007 , , ,,,, , ' , , , ,,,, , , 2010 1008 1002 , , ,
I actually want to standardize each of these values with corresponding standard deviations for each monthly data column.. I have found out the standard deviations for each column.. but now i need to find the standared deviation only for a prescribed mean value ie, when i am finding the standared deviation for the January data column.. the mean should be calculated only for the january data, say from 19501970. With this mean i want to calculate the SD for entire column. Any help will be appreciated..
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
 Jonathan Rocher, PhD Scientific software developer Enthought, Inc. jrocher@enthought.com 15125361057 http://www.enthought.com
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
To standardize the data over each column you'll want to do: (data  data.mean(axis=0)) / data.std(axis=0, ddof=1) Note the broadcasting behavior of the (matrix  vector) operationsee NumPy documentation for more details. The ddof=1 is there to give you the (unbiased) sample standard deviation. <shameless plug> If you're looking for data structures to carry around your metadata (dates and month labels), look to pandas (my project: http://pandas.sourceforge.net/) or larry (http://larry.sourceforge.net/). </shameless plug>  Wes
participants (4)

Climate Research

Jonathan Rocher

Pierre GM

Wes McKinney