[Tutor] numpy.mean across multiple netcdf files

Wed Aug 17 01:04:05 CEST 2011

Thanks Andre I had a go at following your advice but it didn't seem to work
(it kept focusing on the last loop and not combining them all together) so I
have posted a note on scipy user group instead (code below).
Also thanks for the advice regarding averaging!

from netCDF4 import Dataset
import matplotlib.pyplot as plt
import numpy as N
from mpl_toolkits.basemap import Basemap
import os

MainFolder=r"E:/temp_samples/"
for (path, dirs, files) in os.walk(MainFolder):
    for dir in dirs:
        print dir
    for ncfile in files:
        if ncfile[-3:]=='.nc':
            ncfile=os.path.join(path,ncfile)
            ncfile=Dataset(ncfile, 'r+', 'NETCDF4')
            TSFC=ncfile.variables['T_SFC'][4::24]
            LAT=ncfile.variables['latitude'][:]
            LON=ncfile.variables['longitude'][:]
            TIME=ncfile.variables['time'][:]
            fillvalue=ncfile.variables['T_SFC']._FillValue
            ncfile.close()

#calculate summary stats
            big_array=[]
            for i in TSFC:
                big_array.append(i)
                big_array=N.array(big_array)
                Mean=N.mean(big_array, axis=0)

#plot output summary stats
            map = Basemap(projection='merc',llcrnrlat=-40,urcrnrlat=-33,

llcrnrlon=139.0,urcrnrlon=151.0,lat_ts=0,resolution='i')
            map.drawcoastlines()
            map.drawstates()
            x,y=map(*N.meshgrid(LON,LAT))
            plt.title('Total Mean at 3pm')
            ticks=[-5,0,5,10,15,20,25,30,35,40,45,50]
            CS = map.contourf(x,y,Mean,ticks, cmap=plt.cm.jet)
            l,b,w,h =0.1,0.1,0.8,0.8
            cax = plt.axes([l+w+0.025, b, 0.025, h])
            plt.colorbar(CS,cax=cax, drawedges=True)
            plt.show()

On Thu, Aug 11, 2011 at 4:48 PM, Andre' Walker-Loud <walksloud at gmail.com>wrote:

> Hello Anonymous Questioner,
>
> First - you will probably be told this isn't the correct email list for
> this question - this is a general python tutorial list, while your question
> is numpy specific, so if this doesn't help, you should probably look to
> another email list.
>
> There are a couple general comments to make regarding your question.
>  Basically, you want to load all the data into one big numpy array.
>  Currently, it seems your data array has 3 indices, from this line
>
> > TSFC=ncfile.variables['T_SFC'][4::24,:,:]
>
> you can make a larger array with one more index, where the new index runs
> over files.  So instead of looping over the files as you do, you can use the
> loop to build the larger array.
>
> I am sure there is a more elegant way to do this, but here is something
> that is along the lines of what you want
>
> > big_array = []
> > for ncfile in files:
> >       big_array.append(ncfile.variables['T_SFC'][4::24,:,:])
>
> if all of your data you are taking from each file is the same size for all
> the files, then you can declare the whole thing an array
>
> > big_array = N.array(big_array)
>
> and then you can take averages very easily by specifying which index, or
> indices you want to average over.  If life is not so kind, and your files
> are of different sizes, then you have to be more careful.  But to give a
> precise answer to your problem would require knowledge of what your data
> files actually look like.
>
>
> Also, a bit of caution with averaging (in case you are unfamiliar).  Say
> you have 5 data sets, and each one has a different amount of data in it.  If
> you first take the average of each set, you can not simply take the average
> of those sets to produce the global average.  You have to weight each set by
> the amount of data you averaged.  Say the length of the sets are
> [N0,N1,N2,N3,N4] with average values [a0,a1,a2,a3,a4], then to produce the
> global average, you need to take
>
> avg = (1 / (N0+N1+N2+N3+N4) ) * (N0*a0 + N1*a1 + N2*a2 + N3*a3 + N4*a4)
>
> a few lines of algebra can demonstrate this produces the average you would
> get by combining all the data in all the sets and taking one big average.
>
>
> Regards,
>
> Andre
>
>
>
> On Aug 10, 2011, at 6:57 PM, questions anon wrote:
>
> > I have many ncfiles each containing one month of hourly temperature data.
> > I have worked out how to loop through a number of ncfiles and calculate
> the mean for each file at a particular time and even plot this and output as
> a *.png.
> > What I am unsure of is how to calculate the mean at a particular time for
> all files.
> > Do I somehow output the mean for each folder and then calculate the mean
> of all the means? Or is there some way to loop through and calculate it
> directly?
> > Below is the code I have working that calculates the mean for each file.
> Any help will be greatly appreciated!!
> >
> > from netCDF4 import Dataset
> > import numpy as N
> >
> > MainFolder=r"D:/temp_samples/"
> > print MainFolder
> > for (path, dirs, files) in os.walk(MainFolder):
> >     for dir in dirs:
> >         print dir
> >     path=path+'/'
> >
> >     for ncfile in files:
> >         if ncfile[-3:]=='.nc':
> >             print "dealing with ncfiles:", ncfile
> >             ncfile=os.path.join(path,ncfile)
> >             ncfile=Dataset(ncfile, 'r+', 'NETCDF4')
> >             TSFC=ncfile.variables['T_SFC'][4::24,:,:] #this chooses a
> particular time for each day for the whole month
> >             LAT=ncfile.variables['latitude'][:]
> >             LON=ncfile.variables['longitude'][:]
> >             TIME=ncfile.variables['time'][:]
> >             fillvalue=ncfile.variables['T_SFC']._FillValue
> >             ncfile.close()
> >
> >             #calculate summary stats
> >             TSFCmean=N.mean(TSFC, axis=0)
> >             print "The TSFCmean array is:", TSFCmean
> >
> > # there is a lot more code that follows this to plot the mean for each
> ncfile but I have cut that out as I think it is irrelevant to my question
> >
> > _______________________________________________
> > Tutor maillist  -  Tutor at python.org
> > To unsubscribe or change subscription options:
> > http://mail.python.org/mailman/listinfo/tutor
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110817/01b840bb/attachment-0001.html>