[Tutor] Should I use generators here?

Kent Johnson kent37 at tds.net
Mon May 8 12:02:10 CEST 2006


Tony C wrote:
> 
> I wrote a small Python program to count some simple statistics on a 
> Visual Basic program thatI am maintaining.
> 
> The Python program counts total lines, whitespace lines, comment lines, 
> Public & Private Subroutines, Public and Private Functions.
> The Python program takes about 20-40 seconds to count all these stats 
> since I started using Psyco, but I am wondering if I can
> eliminate Pysco and improve the program performance using generators (or 
> some other technique).
> 
> The running time is quick enough, I'm just wondering  if there are other 
> simple performance tweaks to use.
> I've already eliminated all . (dot) references inside the loops.

Hi Tony,

I don't see any obvious performance problems in the code you posted. If
you are serious about speeding up your program you should learn to use
the profile module. It will tell you where the time is going, rather
than just asking us to guess. Take a look at the module docs and ask
again here if you can't figure it out.
> 
> I haven't quite got my head around generators yet, or when to use /not 
> use them, even though I have seen tutorials and examples.

Generators are not an optimization technique so much as a way to 
structure code that includes iteration. Using generators can reduce 
memory consumption but that doesn't seem to be your problem.

A few minor style notes below...

> 
> def ProcessFiletype(Thesefiles, Summary, Stats):
> 
>     """Iterate over all the files in 'Thesefiles', and process each 
> file, one at a time"""
>    
>     global TotalAllLines

The usual Python naming convention is to start variable names with a 
lower-case letter. There is no reason you have to do this though.
>    
>     LongestFilenameLen=0
>     for Onefile in Thesefiles:
>         Onefile = Onefile.lower().capitalize()
>         FilenameLen = len(Onefile)
>         if( FilenameLen > LongestFilenameLen):
>             LongestFilenameLen = FilenameLen
>         #print Onefile
>        
>         try:
>             fh=open(Onefile, "r")
>         except IOError:       
>             print("\nFATAL ERROR ocurred opening %s for input" % Onefile)
>         else:
>             try:
>                 Filecontents = fh.readlines()  # these files are very 
> small, less than 100k each, so reading in an entire file isn't a problem
>                 fh.close()
>             except IOError:
>                 print("\nFatal error occurred reading from %s\n\n" % 
> InputFilename)
>             else:   
>                 Summary[Onefile] = deepcopy(Stats)    # associate each 
> filename with a new stats dict with 0 counts for all alttributes
> 
>                 Filestats = Summary[Onefile]
>                 Filestats["TotalLines"] = len(Filecontents)
>                 Summary[Onefile] = Filestats

You don't have to assign Filestats back into Summary. Filestats is a 
reference to the stats already stored in Summary.
> 
>                 for line in Filecontents:
>                     TotalAllLines = TotalAllLines + 1
>                     #Filteredline=line.strip()
>                     Filteredline=line
>                     if( not IsCommentLine(Filteredline, Summary[Onefile] 
> ) ):

You could use Filestats instead of Summary[Onefile] in each of these lines.

Kent

>                         if( not IsWhitespaceLine(Filteredline, 
> Summary[Onefile] )) :
>                             if( not IsPrivateSub(Filteredline, 
> Summary[Onefile] )):
>                                 if( not IsPrivateFunc(Filteredline, 
> Summary[Onefile] ) ):
>                                     if( not IsPublicSub(Filteredline, 
> Summary[Onefile] )):
>                                         IsPublicFunc(Filteredline, 
> Summary[Onefile] )
>    
>     return FilenameLen
>        
> #/////////////////////////////////////////////////////////
> 
> def ProcessAllFiles(Summary, Stats, FileTypes, FiletypeStats):
> 
>     """Iterates over all Files in current directory that have the 
> extensions in Filetypes"""
> 
>     from glob import glob
>     LongestFilenameLen = 0
>     for Filetype in FileTypes:
>         TheseFiles = glob("*" + Filetype)
>         TheseFiles.sort()
>         FiletypeStats[Filetype]=len(TheseFiles)
>         Longest = ProcessFiletype(TheseFiles, Summary, Stats)       
>         if( Longest > LongestFilenameLen):
>             LongestFilenameLen = Longest
>            
>     return LongestFilenameLen
>    
> #/////////////////////////////////////////////////////////
> 
> def main(args):
> 
>     import psyco
>     psyco.full()
> 
>     global TotalAllLines, TotalFilecount, TotalCommentLines, 
> TotalWhitespaceLines, TotalPrivateSubs, TotalPublicSubs, 
> TotalPrivateFuncs, TotalPublicFuncs
> 
>     TotalAllLines = 0
> 
>     FileTypes=[".frm", ".bas", ".cls"] # Visual Basic source file extensions
>     FiletypeStats={}
>     FileStats={ "TotalLines":0, "WhitespaceLines":0, "CommentLines":0, 
> "PrivateSubCount":0, "PublicSubCount":0, "PrivateFuncCount":0, 
> "PublicFuncCount":0 }
>     FileSummary={}   
> 
>     LongestFilenameLen = ProcessAllFiles(FileSummary, FileStats, 
> FileTypes, FiletypeStats)
> 
>     for Type, Count in FiletypeStats.iteritems():
>         print("\nThere are %3lu files with the %s extension" % (Count, 
> Type.upper()) )
>    
>    
>     print("\n")
>    
>     TotalFilecount = 0
> 
>     for File, Stats in FileSummary.iteritems():
>         TotalFilecount = TotalFilecount + 1
>         print("%s - %4lu Lines, %3lu Whitespace lines, %4lu Comments, 
> %4lu Private Subs, %4lu Public Subs, %4lu Private Functions, %4lu Public 
> Functions\n" % ( File, Stats["TotalLines"], Stats["WhitespaceLines"], 
> Stats["CommentLines"], Stats["PrivateSubCount"], 
> Stats["PublicSubCount"], Stats["PrivateFuncCount"], 
> Stats["PublicFuncCount"] ) )
>    
>     print("\nTotal Lines= %5lu in %lu Files\n" % (TotalAllLines, 
> TotalFilecount) )   
>     print("\nTotal Comment Lines = %5lu, Total Whitespace = %lu" % 
> (TotalCommentLines, TotalWhitespaceLines) )   
>     print("Total Private Subs  = %5lu, Total Public Subs  = %5lu" % 
> (TotalPrivateSubs, TotalPublicSubs) )   
>     print("Total Private Funcs = %5lu, Total Public Funcs = %5lu\n\n\n" 
> % (TotalPrivateFuncs, TotalPublicFuncs) )   
> 
>    
>     return None
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor




More information about the Tutor mailing list