[Tutor] summary stats grouped by month year
Andre' Walker-Loud
walksloud at gmail.com
Tue May 8 08:41:41 CEST 2012
Hello anonymous questioner,
first comment - you may want to look into hdf5 data structures
http://www.hdfgroup.org/HDF5/
and the python tools to play with them
pytables - http://www.pytables.org/moin
h5py - http://code.google.com/p/h5py/
I have personally used pytables more - but not for any good reason. If you happen to have the Enthought python distribution - these come with the package, as well as an installation of hdf5
hdf5 is a very nice file format for storing large amounts of data (binary) with descriptive meta-data. Also, numpy plays very nice with hdf5. Given all your questions here, I suspect you would benefit from learning about these and learning to play with them.
Now to your specific question.
> I would like to calculate summary statistics of rainfall based on year and month.
> I have the data in a text file (although could put in any format if it helps) extending over approx 40 years:
> YEAR MONTH MeanRain
> 1972 Jan 12.7083199
> 1972 Feb 14.17007142
> 1972 Mar 14.5659302
> 1972 Apr 1.508517302
> 1972 May 2.780009889
> 1972 Jun 1.609619287
> 1972 Jul 0.138150181
> 1972 Aug 0.214346148
> 1972 Sep 1.322102228
>
> I would like to be able to calculate the total rain annually:
>
> YEAR Annualrainfall
> 1972 400
> 1973 300
> 1974 350
> ....
> 2011 400
>
> and also the monthly mean rainfall for all years:
>
> YEAR MonthlyMeanRain
> Jan 13
> Feb 15
> Mar 8
> .....
> Dec 13
>
>
> Is this something I can easily do?
Yes - this should be very easy. Imagine importing all this data into a numpy array
===
import numpy as np
data = open(your_data).readlines()
years = []
for line in data:
if line.split()[0] not in years:
years.append(line.split()[0])
months = ['Jan','Feb',....,'Dec']
rain_fall = np.zeros([len(n_year),len(months)])
for y,year in enumerate(years):
for m,month in enumerate(months):
rain_fall[y,m] = float(data[ y * 12 + m].split()[2])
# to get average per year - average over months - axis=1
print np.mean(rain_fall,axis=1)
# to get average per month - average over years - axis=0
print np.mean(rain_fall,axis=0)
===
now you should imagine doing this by setting up dictionaries, so that you can request an average for year 1972 or for month March. That is why I used the enumerate function before to walk the indices - so that you can imagine building the dictionary simultaneously.
years = {'1972':0, '1973':1, ....}
months = {'Jan':0,'Feb':1,...'Dec':11}
then you can access and store the data to the array using these dictionaries.
print rain_fall[int('%(1984)s' % years), int('%(March)s' % months)]
Andre
> I have started by simply importing the text file but data is not represented as time so that is probably my first problem and then I am not sure how to group them by month/year.
>
> textfile=r"textfile.txt"
> f=np.genfromtxt(textfile,skip_header=1)
>
> Any feedback will be greatly appreciated.
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
More information about the Tutor
mailing list