[Tutor] numpy.mean across multiple netcdf files
questions anon
questions.anon at gmail.com
Wed Aug 17 01:04:05 CEST 2011
Thanks Andre I had a go at following your advice but it didn't seem to work
(it kept focusing on the last loop and not combining them all together) so I
have posted a note on scipy user group instead (code below).
Also thanks for the advice regarding averaging!
from netCDF4 import Dataset
import matplotlib.pyplot as plt
import numpy as N
from mpl_toolkits.basemap import Basemap
import os
MainFolder=r"E:/temp_samples/"
for (path, dirs, files) in os.walk(MainFolder):
for dir in dirs:
print dir
for ncfile in files:
if ncfile[-3:]=='.nc':
ncfile=os.path.join(path,ncfile)
ncfile=Dataset(ncfile, 'r+', 'NETCDF4')
TSFC=ncfile.variables['T_SFC'][4::24]
LAT=ncfile.variables['latitude'][:]
LON=ncfile.variables['longitude'][:]
TIME=ncfile.variables['time'][:]
fillvalue=ncfile.variables['T_SFC']._FillValue
ncfile.close()
#calculate summary stats
big_array=[]
for i in TSFC:
big_array.append(i)
big_array=N.array(big_array)
Mean=N.mean(big_array, axis=0)
#plot output summary stats
map = Basemap(projection='merc',llcrnrlat=-40,urcrnrlat=-33,
llcrnrlon=139.0,urcrnrlon=151.0,lat_ts=0,resolution='i')
map.drawcoastlines()
map.drawstates()
x,y=map(*N.meshgrid(LON,LAT))
plt.title('Total Mean at 3pm')
ticks=[-5,0,5,10,15,20,25,30,35,40,45,50]
CS = map.contourf(x,y,Mean,ticks, cmap=plt.cm.jet)
l,b,w,h =0.1,0.1,0.8,0.8
cax = plt.axes([l+w+0.025, b, 0.025, h])
plt.colorbar(CS,cax=cax, drawedges=True)
plt.show()
On Thu, Aug 11, 2011 at 4:48 PM, Andre' Walker-Loud <walksloud at gmail.com>wrote:
> Hello Anonymous Questioner,
>
> First - you will probably be told this isn't the correct email list for
> this question - this is a general python tutorial list, while your question
> is numpy specific, so if this doesn't help, you should probably look to
> another email list.
>
> There are a couple general comments to make regarding your question.
> Basically, you want to load all the data into one big numpy array.
> Currently, it seems your data array has 3 indices, from this line
>
> > TSFC=ncfile.variables['T_SFC'][4::24,:,:]
>
> you can make a larger array with one more index, where the new index runs
> over files. So instead of looping over the files as you do, you can use the
> loop to build the larger array.
>
> I am sure there is a more elegant way to do this, but here is something
> that is along the lines of what you want
>
> > big_array = []
> > for ncfile in files:
> > big_array.append(ncfile.variables['T_SFC'][4::24,:,:])
>
> if all of your data you are taking from each file is the same size for all
> the files, then you can declare the whole thing an array
>
> > big_array = N.array(big_array)
>
> and then you can take averages very easily by specifying which index, or
> indices you want to average over. If life is not so kind, and your files
> are of different sizes, then you have to be more careful. But to give a
> precise answer to your problem would require knowledge of what your data
> files actually look like.
>
>
> Also, a bit of caution with averaging (in case you are unfamiliar). Say
> you have 5 data sets, and each one has a different amount of data in it. If
> you first take the average of each set, you can not simply take the average
> of those sets to produce the global average. You have to weight each set by
> the amount of data you averaged. Say the length of the sets are
> [N0,N1,N2,N3,N4] with average values [a0,a1,a2,a3,a4], then to produce the
> global average, you need to take
>
> avg = (1 / (N0+N1+N2+N3+N4) ) * (N0*a0 + N1*a1 + N2*a2 + N3*a3 + N4*a4)
>
> a few lines of algebra can demonstrate this produces the average you would
> get by combining all the data in all the sets and taking one big average.
>
>
> Regards,
>
> Andre
>
>
>
> On Aug 10, 2011, at 6:57 PM, questions anon wrote:
>
> > I have many ncfiles each containing one month of hourly temperature data.
> > I have worked out how to loop through a number of ncfiles and calculate
> the mean for each file at a particular time and even plot this and output as
> a *.png.
> > What I am unsure of is how to calculate the mean at a particular time for
> all files.
> > Do I somehow output the mean for each folder and then calculate the mean
> of all the means? Or is there some way to loop through and calculate it
> directly?
> > Below is the code I have working that calculates the mean for each file.
> Any help will be greatly appreciated!!
> >
> > from netCDF4 import Dataset
> > import numpy as N
> >
> > MainFolder=r"D:/temp_samples/"
> > print MainFolder
> > for (path, dirs, files) in os.walk(MainFolder):
> > for dir in dirs:
> > print dir
> > path=path+'/'
> >
> > for ncfile in files:
> > if ncfile[-3:]=='.nc':
> > print "dealing with ncfiles:", ncfile
> > ncfile=os.path.join(path,ncfile)
> > ncfile=Dataset(ncfile, 'r+', 'NETCDF4')
> > TSFC=ncfile.variables['T_SFC'][4::24,:,:] #this chooses a
> particular time for each day for the whole month
> > LAT=ncfile.variables['latitude'][:]
> > LON=ncfile.variables['longitude'][:]
> > TIME=ncfile.variables['time'][:]
> > fillvalue=ncfile.variables['T_SFC']._FillValue
> > ncfile.close()
> >
> > #calculate summary stats
> > TSFCmean=N.mean(TSFC, axis=0)
> > print "The TSFCmean array is:", TSFCmean
> >
> > # there is a lot more code that follows this to plot the mean for each
> ncfile but I have cut that out as I think it is irrelevant to my question
> >
> > _______________________________________________
> > Tutor maillist - Tutor at python.org
> > To unsubscribe or change subscription options:
> > http://mail.python.org/mailman/listinfo/tutor
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110817/01b840bb/attachment-0001.html>
More information about the Tutor
mailing list