[Tutor] Text matching
Jason Massey
jason.massey at gmail.com
Tue May 15 13:54:19 CEST 2007
heh... forwarding to the list, too.
---------- Forwarded message ----------
From: Jason Massey <jason.massey at gmail.com>
Date: May 15, 2007 6:51 AM
Subject: Re: [Tutor] Text matching
To: "Gardner, Dean" <Dean.Gardner at barco.com>
Look at it a different way. If the one thing that is sure to be there is
the SomeSpec portion, then how about making a listing of where they occur
and slicing everything up.
Concretely, as Mr. Yoo would say:
f = open(r"c:\python24\test.txt",'r').readlines()
logList = []
indices = [x for x,line in enumerate(f) if 'SomeSpec' in line]
for x in range(len(indices)-1):
logList.append(f[indices[x]:indices[x+1]])
#tack on the last record
logList.append(f[indices[-1]:])
for count,x in enumerate(logList):
print count,':',x
C:\Python24>test.py
0 : ['---- SomeSpec 0000-0001 ----\n', '\n', '> some commit 1\n', '\n',
'Reviewed By: someone\n']
1 : ['---- SomeSpec 0000-0002 ----\n', '> some commit 2\n', '\n']
2 : ['---- SomeSpec 0000-0003 ----\n', '>some commit 1\n', 'Reviewed By:
Someone']
On 5/15/07, Gardner, Dean <Dean.Gardner at barco.com> wrote:
>
> So I took Kents advice and refactored but I have uncovered another
> problem which I am hoping people may be able to help with. I realized
> that the string I was using to identify the end of a record can in some
> cases not be found ( i.e. if a commit wasn't reviewed). This can lead to
> additional records being returned.
>
> Can anyone suggest how I can get round this?
>
> Text file example ( in this case looking for commit 1 would give details
> of commit two also):
>
> ---- SomeSpec 0000-0001 ----
>
> > some commit 1
>
> Reviewed By: someone
> ---- SomeSpec 0000-0002 ----
> > some commit 2
>
> ---- SomeSpec 0000-0003 ----
> >some commit 1
> Reviewed By: Someone
>
>
> Code:
>
> def searchChangeLog(self,filename):
> uid = self.item.Uid()
> record=[]
> logList=[]
> displayList=[]
> f = open(filename)
> logTextFile="temp.txt"
> """ searched through the changelog 'breaking' it up into
> individual entries"""
>
> for line in f:
> if ("Reviewed: 000" in line):
> logList.append(record)
> record = []
> else:
> record.append(line)
>
> """ searches to determine if we can find entries for
> a particualr item"""
> for record in logList:
> for item in record:
> if uid in item:
> displayList.append(record)
> """ creates a temporary file to write our find results to """
> removeFile = os.path.normpath( os.path.join(os.getcwd(),
> logTextFile))
>
> # if the file exists, get rid of it before writing our new
> findings
> if Shared.config.Exists(removeFile):
> os.remove(removeFile)
> recordLog = open(logTextFile,"a")
>
> for record in range(len(displayList)):
> for item in displayList[record]:
> recordLog.write (item)
> recordLog.close()
> #display our results
> commandline = "start cmd /C " + logTextFile
> os.system(commandline)
>
> Dean Gardner
> Test Engineer
> Barco
> Bonnington Bond, 2 Anderson Place, Edinburgh EH6 5NP, UK
> Tel + 44 (0) 131 472 5731 Fax + 44 (0) 131 472 4799
> www.barco.com
> dean.gardner at barco.com
>
>
> -----Original Message-----
> From: Kent Johnson [mailto: kent37 at tds.net]
> Sent: 04 May 2007 11:26
> To: Gardner, Dean
> Cc: tutor at python.org
> Subject: Re: [Tutor] Text matching
>
> Gardner, Dean wrote:
> >
> > So here it is....it might not be pretty (it seems a bit un-python like
>
> > to me) but it works exactly as required. If anyone can give any tips
> > for possible optimisation or refactor I would be delighted to hear
> > from them.
> >
> > Thanks
> >
> > uid = self.item.Uid()
> > record=[]
> > logList=[]
> > displayList=[]
> > f = open(filename)
> > logTextFile=" temp.txt"
> > """ searched through the changelog 'breaking' it up into
> > individual entries"""
> > try:
> > while 1:
> > endofRecord=0
> > l = f.next()
> > if l.startswith("----"):
> > record.append(l)
> > l=f.next()
> > while endofRecord==0:
> > if "Reviewed: 000" not in l:
> > record.append(l)
> > l=f.next()
> > else:
> > logList.append(record)
> > record=[]
> > endofRecord=1
> > except StopIteration:
> > pass
>
> I don't think you need endofRecord and the nested loops here. In fact I
> think you could use a plain for loop here. AFAICT all you are doing is
> accumulating records with no special handling for anything except the
> end records. What about this:
> record = []
> for line in f:
> if "Reviewed: 000" in line:
> logList.append(record)
> record = []
> else:
> record.append(line)
>
> > """ searches to determine if we can find entries for
> > a particualr item"""
> > for record in logList:
> > currRec = record
> > for item in currRec:
> > if uid in item:
> > displayList.append(currRec)
>
> The currRec variable is not needed, just use record directly.
> If uid can only be in a specific line of the record you can test that
> directly, e.g.
> for record in logList:
> if uid in record[1]:
>
> > """ creates a temporary file to write our find results to """
> > removeFile = os.path.normpath( os.path.join(os.getcwd(),
> > logTextFile))
> >
> > # if the file exists, get rid of it before writing our new
> > findings
> > if Shared.config.Exists (removeFile):
> > os.remove(removeFile)
> > recordLog = open(logTextFile,"a")
> >
> > for record in range(len(displayList)):
> > for item in displayList[record]:
> > recordLog.write(item)
>
> for record in displayList:
> recordLog.writelines(record)
>
> > recordLog.close()
> > #display our results
> > commandline = "start cmd /C " + logTextFile
> > os.system(commandline)
> >
>
> Kent
>
>
> DISCLAIMER:
> Unless indicated otherwise, the information contained in this message is
> privileged and confidential, and is intended only for the use of the
> addressee(s) named above and others who have been specifically authorized to
> receive it. If you are not the intended recipient, you are hereby notified
> that any dissemination, distribution or copying of this message and/or
> attachments is strictly prohibited. The company accepts no liability for any
> damage caused by any virus transmitted by this email. Furthermore, the
> company does not warrant a proper and complete transmission of this
> information, nor does it accept liability for any delays. If you have
> received this message in error, please contact the sender and delete the
> message. Thank you.
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20070515/32f498d5/attachment-0001.html
More information about the Tutor
mailing list