[Tutor] summary stats grouped by month year

Wed May 9 02:48:15 CEST 2012

Excellent, thank you so much. I don't understand all the steps at this
stage so I will need some time to go through it carefully but it works
perfectly.
Thanks again!

On Tue, May 8, 2012 at 4:41 PM, Andre' Walker-Loud <walksloud at gmail.com>wrote:

> Hello anonymous questioner,
>
> first comment - you may want to look into hdf5 data structures
>
> http://www.hdfgroup.org/HDF5/
>
> and the python tools to play with them
>
> pytables - http://www.pytables.org/moin
> h5py - http://code.google.com/p/h5py/
>
> I have personally used pytables more - but not for any good reason.  If
> you happen to have the Enthought python distribution - these come with the
> package, as well as an installation of hdf5
>
> hdf5 is a very nice file format for storing large amounts of data (binary)
> with descriptive meta-data.  Also, numpy plays very nice with hdf5.  Given
> all your questions here, I suspect you would benefit from learning about
> these and learning to play with them.
>
> Now to your specific question.
>
> > I would like to calculate summary statistics of rainfall based on year
> and month.
> > I have the data in a text file (although could put in any format if it
> helps) extending over approx 40 years:
> > YEAR MONTH    MeanRain
> > 1972 Jan    12.7083199
> > 1972 Feb    14.17007142
> > 1972 Mar    14.5659302
> > 1972 Apr    1.508517302
> > 1972 May    2.780009889
> > 1972 Jun    1.609619287
> > 1972 Jul    0.138150181
> > 1972 Aug    0.214346148
> > 1972 Sep    1.322102228
> >
> > I would like to be able to calculate the total rain annually:
> >
> > YEAR   Annualrainfall
> > 1972    400
> > 1973    300
> > 1974    350
> > ....
> > 2011     400
> >
> > and also the monthly mean rainfall for all years:
> >
> > YEAR  MonthlyMeanRain
> > Jan      13
> > Feb      15
> > Mar       8
> > .....
> > Dec       13
> >
> >
> > Is this something I can easily do?
>
> Yes - this should be very easy.  Imagine importing all this data into a
> numpy array
>
> ===
> import numpy as np
>
> data = open(your_data).readlines()
> years = []
> for line in data:
>        if line.split()[0] not in years:
>                years.append(line.split()[0])
> months = ['Jan','Feb',....,'Dec']
>
> rain_fall = np.zeros([len(n_year),len(months)])
> for y,year in enumerate(years):
>        for m,month in enumerate(months):
>                rain_fall[y,m] = float(data[ y * 12 + m].split()[2])
>
> # to get average per year - average over months - axis=1
> print np.mean(rain_fall,axis=1)
>
> # to get average per month - average over years - axis=0
> print np.mean(rain_fall,axis=0)
>
> ===
>
> now you should imagine doing this by setting up dictionaries, so that you
> can request an average for year 1972 or for month March.  That is why I
> used the enumerate function before to walk the indices - so that you can
> imagine building the dictionary simultaneously.
>
> years = {'1972':0, '1973':1, ....}
> months = {'Jan':0,'Feb':1,...'Dec':11}
>
> then you can access and store the data to the array using these
> dictionaries.
>
> print rain_fall[int('%(1984)s' % years), int('%(March)s' % months)]
>
>
> Andre
>
>
>
>
>
> > I have started by simply importing the text file but data is not
> represented as time so that is probably my first problem and then I am not
> sure how to group them by month/year.
> >
> > textfile=r"textfile.txt"
> > f=np.genfromtxt(textfile,skip_header=1)
> >
> > Any feedback will be greatly appreciated.
> >
> > _______________________________________________
> > Tutor maillist  -  Tutor at python.org
> > To unsubscribe or change subscription options:
> > http://mail.python.org/mailman/listinfo/tutor
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20120509/01f9fc16/attachment.html>