[Tutor] numpy.mean across multiple netcdf files
Andre' Walker-Loud
walksloud at gmail.com
Thu Aug 11 08:48:15 CEST 2011
Hello Anonymous Questioner,
First - you will probably be told this isn't the correct email list for this question - this is a general python tutorial list, while your question is numpy specific, so if this doesn't help, you should probably look to another email list.
There are a couple general comments to make regarding your question. Basically, you want to load all the data into one big numpy array. Currently, it seems your data array has 3 indices, from this line
> TSFC=ncfile.variables['T_SFC'][4::24,:,:]
you can make a larger array with one more index, where the new index runs over files. So instead of looping over the files as you do, you can use the loop to build the larger array.
I am sure there is a more elegant way to do this, but here is something that is along the lines of what you want
> big_array = []
> for ncfile in files:
> big_array.append(ncfile.variables['T_SFC'][4::24,:,:])
if all of your data you are taking from each file is the same size for all the files, then you can declare the whole thing an array
> big_array = N.array(big_array)
and then you can take averages very easily by specifying which index, or indices you want to average over. If life is not so kind, and your files are of different sizes, then you have to be more careful. But to give a precise answer to your problem would require knowledge of what your data files actually look like.
Also, a bit of caution with averaging (in case you are unfamiliar). Say you have 5 data sets, and each one has a different amount of data in it. If you first take the average of each set, you can not simply take the average of those sets to produce the global average. You have to weight each set by the amount of data you averaged. Say the length of the sets are [N0,N1,N2,N3,N4] with average values [a0,a1,a2,a3,a4], then to produce the global average, you need to take
avg = (1 / (N0+N1+N2+N3+N4) ) * (N0*a0 + N1*a1 + N2*a2 + N3*a3 + N4*a4)
a few lines of algebra can demonstrate this produces the average you would get by combining all the data in all the sets and taking one big average.
Regards,
Andre
On Aug 10, 2011, at 6:57 PM, questions anon wrote:
> I have many ncfiles each containing one month of hourly temperature data.
> I have worked out how to loop through a number of ncfiles and calculate the mean for each file at a particular time and even plot this and output as a *.png.
> What I am unsure of is how to calculate the mean at a particular time for all files.
> Do I somehow output the mean for each folder and then calculate the mean of all the means? Or is there some way to loop through and calculate it directly?
> Below is the code I have working that calculates the mean for each file. Any help will be greatly appreciated!!
>
> from netCDF4 import Dataset
> import numpy as N
>
> MainFolder=r"D:/temp_samples/"
> print MainFolder
> for (path, dirs, files) in os.walk(MainFolder):
> for dir in dirs:
> print dir
> path=path+'/'
>
> for ncfile in files:
> if ncfile[-3:]=='.nc':
> print "dealing with ncfiles:", ncfile
> ncfile=os.path.join(path,ncfile)
> ncfile=Dataset(ncfile, 'r+', 'NETCDF4')
> TSFC=ncfile.variables['T_SFC'][4::24,:,:] #this chooses a particular time for each day for the whole month
> LAT=ncfile.variables['latitude'][:]
> LON=ncfile.variables['longitude'][:]
> TIME=ncfile.variables['time'][:]
> fillvalue=ncfile.variables['T_SFC']._FillValue
> ncfile.close()
>
> #calculate summary stats
> TSFCmean=N.mean(TSFC, axis=0)
> print "The TSFCmean array is:", TSFCmean
>
> # there is a lot more code that follows this to plot the mean for each ncfile but I have cut that out as I think it is irrelevant to my question
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
More information about the Tutor
mailing list