[Tutor] Review and criticism of python project
Tiger12506
keridee at jayco.net
Sat Jan 5 02:32:06 CET 2008
Cool! Code! (gullom voice - we must look at code, yesss....)
> Hi there.
>
> What this section area does is takes a data file that is comma separated
> and
> imports - there is a unique ID in the first field and a code in the second
> field that corresponds to a certain section of information. What I need
> from
> this is for the process to role up against unique ID all section holdings
> withot duplicates, report on section combinations, and overal section
> counts. In addtion I need the ability to assigna value for page count to
> these sections and have the ability to uploada translation file just in
> case
> a section is identiifed by multiple values that needs to be normalized to
> a
> single unique value.
>
> Sorry for the lengthly code response - all commenst are appreciated - as
> mentioned I am quite new with Python - it is doing what I need it to do
> but
> I think that it is a mess and needs to be cleaned up a little.
>
> Thanks for any comments.
>
> GTXY20
>
> import sys
> import os
> class __analysis:
> def __init__(self):
> print '***Analysis Tool***'
> datafile=raw_input('data file name:')
> self.datafile=datafile
> self.parsefile()
Ouch. Usually in OOP, one never puts any user interaction into a class.
Anytime you want to make an __analysis object, code execution is halted unto
the user responds.... Yuck. If necessary, then make the datafile a parameter
you pass in and do the raw_input outside of the class definition before you
instantiate an object.
As an aside... setting the filename to datafile and then datafile directly
to self.datafile is a little wasteful when you can set it directly to
self.datafile
> # script to import unitID section data and section page count reference
> and
> create a sorted dictionary
> # where in uhdata{} key=unitID and value=unitID section holdings
> # where in pgcnt{} key=Section and value=page count
>
> def parsefile(self):
> try:
> uhdatafile = open(self.datafile, 'r')
> records = uhdatafile.read()
> uhdatafile.close()
> lines = records.split()
Ummm this will split on every space also. Using the name 'lines' as an
indicator I think you were expecting this to actually split on newline
characters, so instead you should use uhdatafile.readlines()
> self.uhdata={}
> for line in lines:
Here you iterate over a created list. It is easier and more efficient to
iterate over the file itself.
for line in uhdatafile:
do_stuff()
uhdatafile.close()
> uh, tf = line.split(',')
> if uh in self.uhdata:
> f=self.uhdata[uh]
> if tf not in f:
> f.append(tf)
> else:
> self.uhdata[uh]=[tf]
This can read
try:
f = self.uhdata[uh]
except KeyError:
self.uhdata[uh] = []
finally:
self.uhdata[uh].append(tf)
Also note that there is no error checking in the algorithm anywhere here.
The file must be exactly formatted or it will create an error.
> for uh, Sections in self.uhdata.items():
> Sections.sort()
This will not work. In the documentation, it says that dictionary object
return a *copy* of the (key,value) pairs.
You are sorting those *copies*. Also sinces uh is not used here
itervalues() is sufficient.
for Sections in self.uhdata.itervalues():
Sections.sort()
> except IOError:
> print 'file not found check file name'
> analysis()
>
> ftranslateok=raw_input('would you like to translate section codes?
> (y/n):')
> if ftranslateok == 'y':
> self.transFn()
> else:
> pass
This is unnecessary, code execution will continue whether you use else: pass
or not.
> pgcountok=raw_input('would you like to assign section page counts?
> (y/n):')
> if pgcountok == 'y':
> self.setPageCounts()
> else:
> missingpgcounts={}
> fmissingpgcounts=[]
> for x in self.uhdata:
> for f in self.uhdata[x]:
> if f not in fmissingpgcounts:
> fmissingpgcounts.append(f)
fmissingpgcounts = [f for f in self.uhdata.itervalues() if f not in
fmissingpgcounts]
Ahhh... This looks like a set.
fmissingpgcounts = set(self.uhdata.itervalues())
> for x in fmissingpgcounts:
> missingpgcounts[x]=0
> self.pgcounts = missingpgcounts
Oh my. So we have a dictionary, create a unique list of the values and then
create another dictionary from that?
The intermediate variable missingpgcounts is not necessary.
self.pgcounts = dict((x,0) for x in fmissingpgcounts)
> fdistmodel=raw_input('would you like to define max section
> distribution cut off? (y/n):')
> if fdistmodel == 'y':
> self.fdistmax=raw_input('what is the max distributions before a
> full book?:')
> self.fdistmax=int(self.fdistmax)
> self.Sectiondistmax()
> else:
> self.fdistmax=1000000000
> self.Sectiondistmax()
self.Sectiondistmax() needs to be typed only once unindented after the
if/else block.
If self.fdistmax will never be zero, then you can do this...
self.fdistmax = int(self.fdistmax) or 1000000000
Oh, and if the user presses enter without puttting in a number then your
program will exit with
an error. If you want you can use this to your advantage. (Press enter for
default of all)
try:
self.fdistmax = int(self.fdistmax)
except ValueError:
self.fdistmax = False
False is a better flag here. Though you will never have a value of a
billion, you could never, ever possibly have a value of False. (0)
Also helps readability. See below.
> sys.exit(1)
>
> # function to determine number of uniqueID for each section
> def Sectionqty(self):
> Sectionqtyoutfile = open('Sectionqty.txt', 'w+')
> Sectionqtyoutfile.write ('Section\tQTY\n')
> from collections import defaultdict
> fcounts=defaultdict(int)
> flst=[]
> flst2=[]
Okay this whole section can be rewritten. The reason is because the two
blocks are basically the same thing with the one stipulation based on a
flag. I will attempt to rewrite this so the code is not copied.
> if self.fdistmax == 1000000000:
> for v in self.uhdata.values():
> for item in v:
> fcounts[item]+=1
>
> for k,v in sorted(fcounts.items()):
> Section=k
> fqty=v
> Sectionqtyoutfile.write ('%s\t%s\n' % (Section, fqty))
>
> else:
> for k,v in self.uhdata.items():
> if len(v)<=self.fdistmax:
> flst.append(self.uhdata[k])
> for i in flst:
> for x in i:
> flst2.append(x)
> for Sections in flst2:
> fcounts[Sections]+=1
> for k,v in sorted(fcounts.items()):
> Section= k
> fqty= v
> Sectionqtyoutfile.write ('%s\t%s\n' % (Section, fqty))
for k,v in self.uhdata.items():
if self.fdistmax or len(v)<=self.fdistmax:
for x in self.uhdata[k]:
fcounts[x]+=1
for (Section, fqty) in sorted(fcounts.items()):
Sectionqtyoutfile.write('%s\t%s\n'%(Section,fqty))
> Sectionqtyoutfile.close()
> self.SectionCombqty()
>
> # function to determine number of uniqueID section combinations and
> associated section page counts
> def SectionCombqty(self):
> SectionCombqtyoutfile = open('SectionCombqty.txt', 'w+')
> SectionCombqtyoutfile.write('Combination Qty\tNumber of
> Sections\tCombination\tCombinationPageCount\tTotalPages\n')
> fullbook = 'Full Book'
> fgreater=[]
> fcheck=0
> from collections import defaultdict
> fcomb=defaultdict(int)
> for uh in self.uhdata.keys():
> fcomblst=self.uhdata[uh]
> fcomb[tuple(fcomblst)]+=1
for value in self.uhdata.itervalues():
fcomb[tuple(value)]+=1
> if self.fdistmax == 1000000000:
> for count, items in sorted( ((v,k) for k,v in fcomb.items
> ()),reverse=True):
> fpgcounts = sum([self.pgcounts.get(i,i) for i in
> list(items)])
> Sectioncomb = ','.join(items)
> holdings = len(items)
> totpgcounts = count*fpgcounts
> SectionCombqtyoutfile.write ('%s\t%s\t%s\t%s\t%s\n' %
> (count,holdings,Sectioncomb,fpgcounts,totpgcounts))
>
> else:
> for count, items in sorted( ((v,k) for k,v in fcomb.items
> ()),reverse=True):
> if len(items) <= self.fdistmax:
> fpgcounts = sum([self.pgcounts.get(i,i) for i in
> list(items)])
> Sectioncomb = ','.join(items)
> holdings = len(items)
> totpgcounts = count*fpgcounts
> SectionCombqtyoutfile.write ('%s\t%s\t%s\t%s\t%s\n' %
> (count,holdings,Sectioncomb,fpgcounts,totpgcounts))
> for count, items in sorted( ((v,k) for k,v in fcomb.items
> ()),reverse=True):
> if len(items)>self.fdistmax:
> fgreater.append(count)
>
> fcheck=sum(fgreater)
> SectionCombqtyoutfile.write ('%s\t''>''%s\t%s\t%s\t%s\n' %
> (fcheck,self.fdistmax,fullbook,fullbook,fullbook))
See what I did to the section above to see how to make this one block
instead of two copy&paste blocks.
>
> SectionCombqtyoutfile.close()
>
> # where in pgcnt{} key=Section and value=page count
> def setPageCounts(self):
> pagecountfile=raw_input('page count file name:')
> self.pagecountfile=pagecountfile
> try:
> pagecountinfile = open(self.pagecountfile, 'r')
> records = pagecountinfile.read()
> pagecountinfile.close()
> self.pgcounts={}
> lines = records.split()
> for line in lines:
> fpg, cnt = line.split(',')
> self.pgcounts[fpg]=int(cnt)
This is very similar to the block at the beginning of this post. Make the
same changes here.
> except IOError:
> print 'file not found check file name'
> analysis()
>
> # function to determine number of uniqueID distributions and associated
> Sections held
> def Sectiondistmax(self):
> from collections import defaultdict
> Sectiondistoutfile = open('Sectiondist.txt', 'w+')
> Sectiondistoutfile.write ('SectionDistributions\tQTY\n')
> fgreater=[]
> fullbook = "Full Book"
> fcheck=0
> fcount=defaultdict(int)
> for uh in self.uhdata.keys():
> f=self.uhdata[uh]
> fcount[len(f)]+=1
> if self.fdistmax == 1000000000:
> for k,v in sorted(fcount.items()):
You don't have to name these k,v. Any name is fine like:
for fdist, fqty in fcount.items(sorted=True):
> fdist=k
> fqty=v
> Sectiondistoutfile.write ('%s\t%s\n' % (fdist,fqty))
>
> else:
> for k,v in sorted(fcount.items()):
> if k <= self.fdistmax:
> fdist=k
> fqty=v
Same name thing.
> Sectiondistoutfile.write ('%s\t%s\n' % (fdist,fqty))
>
> for k,v in sorted(fcount.items()):
> if k > self.fdistmax:
> fgreater.append(fcount[k])
> fcheck=sum(fgreater)
> Sectiondistoutfile.write ('%s\t%s\n' % (fullbook,fcheck))
> Sectiondistoutfile.close()
> self.Sectionqty()
>
> #function to translate UnitID Sectioncodes to normalized assigned Section
> code (e.g. parent and mulitple child section codes)
> def transFn(self):
> transfile=raw_input('Section translate file name:')
> self.transfile=transfile
> try:
> transfilein=open(self.transfile, 'r')
> records = transfilein.read()
> transfilein.close()
> lines = records.split()
> transDict = {}
> for line in lines:
> key, value = line.split(',')
> transDict[key] = value
Again, the same thing here as way up above.
> for key, value in self.uhdata.items():
> self.uhdata[key] = [ transDict.get(i, i) for i in value ]
> for k in self.uhdata:
> self.uhdata[k]=sorted(set(self.uhdata[k]))
Uhhh... Right.
> except IOError:
> print 'file not found check file name'
> analysis()
Just some suggestions. A long piece of code for what you are trying to
accomplish... It probably could be shortened, seeing how much duplicate code
is within it. All I did was work through it block by block changing pieces
so that they make a little more sense and are more efficient. I suspect that
a rework of the complete design would be beneficial. I'm just too tired and
busy to think for that long. ;-)
HTH,
JS
More information about the Tutor
mailing list