[Tutor] Text matching

Jason Massey jason.massey at gmail.com
Tue May 15 13:54:19 CEST 2007


heh... forwarding to the list, too.

---------- Forwarded message ----------
From: Jason Massey <jason.massey at gmail.com>
Date: May 15, 2007 6:51 AM
Subject: Re: [Tutor] Text matching
To: "Gardner, Dean" <Dean.Gardner at barco.com>

Look at it a different way.  If the one thing that is sure to be there is
the SomeSpec portion, then how about making a listing of where they occur
and slicing everything up.

Concretely, as Mr. Yoo would say:

f = open(r"c:\python24\test.txt",'r').readlines()
logList = []

indices = [x for x,line in enumerate(f) if 'SomeSpec' in line]
for x in range(len(indices)-1):
    logList.append(f[indices[x]:indices[x+1]])

#tack on the last record
logList.append(f[indices[-1]:])

for count,x in enumerate(logList):
    print count,':',x

C:\Python24>test.py
0 : ['---- SomeSpec 0000-0001 ----\n', '\n', '> some commit 1\n', '\n',
'Reviewed By: someone\n']
1 : ['---- SomeSpec 0000-0002 ----\n', '> some commit 2\n', '\n']
2 : ['---- SomeSpec 0000-0003 ----\n', '>some commit 1\n', 'Reviewed By:
Someone']

On 5/15/07, Gardner, Dean <Dean.Gardner at barco.com> wrote:
>
> So I took Kents advice and refactored but I have uncovered another
> problem which I am hoping people may be able to help with. I realized
> that the string I was using to identify the end of a record can in some
> cases not be found ( i.e. if a commit wasn't reviewed). This can lead to
> additional records being returned.
>
> Can anyone suggest how I can get round this?
>
> Text file example ( in this case looking for commit 1 would give details
> of commit two also):
>
> ---- SomeSpec 0000-0001 ----
>
> > some commit 1
>
> Reviewed By: someone
> ---- SomeSpec 0000-0002 ----
> > some commit 2
>
> ---- SomeSpec 0000-0003 ----
> >some commit 1
> Reviewed By: Someone
>
>
> Code:
>
> def searchChangeLog(self,filename):
>         uid = self.item.Uid()
>         record=[]
>         logList=[]
>         displayList=[]
>         f = open(filename)
>         logTextFile="temp.txt"
>         """ searched through the changelog 'breaking' it up into
>             individual entries"""
>
>         for line in f:
>             if ("Reviewed: 000" in line):
>                 logList.append(record)
>                 record = []
>             else:
>                 record.append(line)
>
>         """ searches to determine if we can find entries for
>             a particualr item"""
>         for record in logList:
>             for item in record:
>                 if uid in item:
>                     displayList.append(record)
>         """ creates a temporary file to write our find results to """
>         removeFile = os.path.normpath( os.path.join(os.getcwd(),
> logTextFile))
>
>         # if the file exists, get rid of it before writing our new
> findings
>         if Shared.config.Exists(removeFile):
>             os.remove(removeFile)
>         recordLog = open(logTextFile,"a")
>
>         for record in range(len(displayList)):
>             for item in displayList[record]:
>                 recordLog.write (item)
>         recordLog.close()
>         #display our results
>         commandline = "start cmd /C " + logTextFile
>         os.system(commandline)
>
> Dean Gardner
> Test Engineer
> Barco
> Bonnington Bond, 2 Anderson Place, Edinburgh EH6 5NP, UK
> Tel + 44 (0) 131 472 5731 Fax + 44 (0) 131 472 4799
> www.barco.com
> dean.gardner at barco.com
>
>
> -----Original Message-----
> From: Kent Johnson [mailto: kent37 at tds.net]
> Sent: 04 May 2007 11:26
> To: Gardner, Dean
> Cc: tutor at python.org
> Subject: Re: [Tutor] Text matching
>
> Gardner, Dean wrote:
> >
> > So here it is....it might not be pretty (it seems a bit un-python like
>
> > to me) but it works exactly as required. If anyone can give any tips
> > for possible optimisation or refactor I would be delighted to hear
> > from them.
> >
> > Thanks
> >
> >         uid = self.item.Uid()
> >         record=[]
> >         logList=[]
> >         displayList=[]
> >         f = open(filename)
> >         logTextFile=" temp.txt"
> >         """ searched through the changelog 'breaking' it up into
> >             individual entries"""
> >         try:
> >             while 1:
> >                 endofRecord=0
> >                 l = f.next()
> >                 if l.startswith("----"):
> >                     record.append(l)
> >                 l=f.next()
> >                 while endofRecord==0:
> >                     if "Reviewed: 000" not in l:
> >                         record.append(l)
> >                         l=f.next()
> >                     else:
> >                         logList.append(record)
> >                         record=[]
> >                         endofRecord=1
> >         except StopIteration:
> >             pass
>
> I don't think you need endofRecord and the nested loops here. In fact I
> think you could use a plain for loop here. AFAICT all you are doing is
> accumulating records with no special handling for anything except the
> end records. What about this:
> record = []
> for line in f:
>    if "Reviewed: 000" in line:
>      logList.append(record)
>      record = []
>    else:
>      record.append(line)
>
> >         """ searches to determine if we can find entries for
> >             a particualr item"""
> >         for record in logList:
> >             currRec = record
> >             for item in currRec:
> >                 if uid in item:
> >                     displayList.append(currRec)
>
> The currRec variable is not needed, just use record directly.
> If uid can only be in a specific line of the record you can test that
> directly, e.g.
> for record in logList:
>    if uid in record[1]:
>
> >         """ creates a temporary file to write our find results to """
> >         removeFile = os.path.normpath( os.path.join(os.getcwd(),
> > logTextFile))
> >
> >         # if the file exists, get rid of it before writing our new
> > findings
> >         if Shared.config.Exists (removeFile):
> >             os.remove(removeFile)
> >         recordLog = open(logTextFile,"a")
> >
> >         for record in range(len(displayList)):
> >             for item in displayList[record]:
> >                 recordLog.write(item)
>
> for record in displayList:
>    recordLog.writelines(record)
>
> >         recordLog.close()
> >         #display our results
> >         commandline = "start cmd /C " + logTextFile
> >         os.system(commandline)
> >
>
> Kent
>
>
> DISCLAIMER:
> Unless indicated otherwise, the information contained in this message is
> privileged and confidential, and is intended only for the use of the
> addressee(s) named above and others who have been specifically authorized to
> receive it. If you are not the intended recipient, you are hereby notified
> that any dissemination, distribution or copying of this message and/or
> attachments is strictly prohibited. The company accepts no liability for any
> damage caused by any virus transmitted by this email. Furthermore, the
> company does not warrant a proper and complete transmission of this
> information, nor does it accept liability for any delays. If you have
> received this message in error, please contact the sender and delete the
> message. Thank you.
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20070515/32f498d5/attachment-0001.html 


More information about the Tutor mailing list