[Tutor] summary stats grouped by month year

Andre' Walker-Loud walksloud at gmail.com
Tue May 8 08:41:41 CEST 2012


Hello anonymous questioner,

first comment - you may want to look into hdf5 data structures

http://www.hdfgroup.org/HDF5/

and the python tools to play with them

pytables - http://www.pytables.org/moin
h5py - http://code.google.com/p/h5py/

I have personally used pytables more - but not for any good reason.  If you happen to have the Enthought python distribution - these come with the package, as well as an installation of hdf5

hdf5 is a very nice file format for storing large amounts of data (binary) with descriptive meta-data.  Also, numpy plays very nice with hdf5.  Given all your questions here, I suspect you would benefit from learning about these and learning to play with them.

Now to your specific question.

> I would like to calculate summary statistics of rainfall based on year and month.
> I have the data in a text file (although could put in any format if it helps) extending over approx 40 years:
> YEAR MONTH    MeanRain
> 1972 Jan    12.7083199
> 1972 Feb    14.17007142
> 1972 Mar    14.5659302
> 1972 Apr    1.508517302
> 1972 May    2.780009889
> 1972 Jun    1.609619287
> 1972 Jul    0.138150181
> 1972 Aug    0.214346148
> 1972 Sep    1.322102228
> 
> I would like to be able to calculate the total rain annually:
> 
> YEAR   Annualrainfall
> 1972    400
> 1973    300
> 1974    350
> ....
> 2011     400
> 
> and also the monthly mean rainfall for all years:
> 
> YEAR  MonthlyMeanRain
> Jan      13
> Feb      15
> Mar       8
> .....
> Dec       13
> 
> 
> Is this something I can easily do?

Yes - this should be very easy.  Imagine importing all this data into a numpy array

===
import numpy as np

data = open(your_data).readlines()
years = []
for line in data:
	if line.split()[0] not in years:
		years.append(line.split()[0])
months = ['Jan','Feb',....,'Dec']

rain_fall = np.zeros([len(n_year),len(months)])
for y,year in enumerate(years):
	for m,month in enumerate(months):
		rain_fall[y,m] = float(data[ y * 12 + m].split()[2])

# to get average per year - average over months - axis=1
print np.mean(rain_fall,axis=1)

# to get average per month - average over years - axis=0
print np.mean(rain_fall,axis=0)

===

now you should imagine doing this by setting up dictionaries, so that you can request an average for year 1972 or for month March.  That is why I used the enumerate function before to walk the indices - so that you can imagine building the dictionary simultaneously.

years = {'1972':0, '1973':1, ....}
months = {'Jan':0,'Feb':1,...'Dec':11}

then you can access and store the data to the array using these dictionaries.

print rain_fall[int('%(1984)s' % years), int('%(March)s' % months)]


Andre





> I have started by simply importing the text file but data is not represented as time so that is probably my first problem and then I am not sure how to group them by month/year. 
> 
> textfile=r"textfile.txt"
> f=np.genfromtxt(textfile,skip_header=1)
> 
> Any feedback will be greatly appreciated.
> 
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor



More information about the Tutor mailing list