finding last match in a file

Harvey Thomas hst at empolis.co.uk
Fri Sep 20 07:06:22 EDT 2002


> In article <87r8fplvq4.fsf at smtp.blueyonder.co.uk>, Keith 
> O'Connell wrote:
> > Dag <dag at animagicnet.no> writes:
> > 
> >> I have a number of very large files where I have to find 
> the last line
> >> which matches a certain experssion.  Currently I'm simply 
> opening the
> >> file and looping through it noteing each time a line 
> matches and at the
> >> end of the loop I see which was the last match.  However 
> this is very
> >> slow and inefficient.  Is there some way to read through the file
> >> backwards and simply find first match starting from the 
> end of the file.
> >> The files are really too big to read the entire thing into memory.
> > 
> > 
> > Would you not be able to use grep with the "-n" option. It will list
> > all your matches with the lines numbered so you can see the 
> last match
> > having the highest line number. If you then use wc on the 
> file you will
> > get the total number of lines. Take one from the other and 
> you can see
> > how far from the end your last match is.
> > 
> > That seems to easy - Have I misunderstood you question?
> 
> The problem is that greping through large files with a fairly complex
> regexp is taking far too long, esp. when I know that the last 
> occurance is
> almost always within the last 100 lines.
> 
> Dag

It's not clear to me whether you need the number of the last line that matches, or just the contents of the last line. If you don't need the number then the following might be useful as  a basis.

def lastlinematching(fileobj, aregex):
# assume initially match will be in about the last 8K of the file
    fileobj.seek(-8192, 2)
    lines = fileobj.readlines()[1:]   #throw away possible incomplete line
    lastline = None
    for line in lines:
        if aregex.search(line):
            lastline = line
    if lastline is None:
        #not in last 8k, start again from the beginning
        fileobj.seek(0, 0)
        for line in fileobj.xreadlines():
            if aregex.search(line):
                lastline = line
    return lastline

Harvey

_____________________________________________________________________
This message has been checked for all known viruses by the MessageLabs Virus Scanning Service.




More information about the Python-list mailing list