How to find the beginning of last line of a big text file ?

Barak, Ron Ron.Barak at lsi.com
Sun Jan 4 12:25:18 CET 2009


Hi Tim,

Thanks for the solution (and effort), and for teaching me some interesting new tricks.

Happy 2009!
Ron.

-----Original Message-----
From: Tim Chase [mailto:python.list at tim.thechases.com]
Sent: Thursday, January 01, 2009 20:04
To: Sebastian Bassi
Cc: python-list at python.org
Subject: Re: How to find the beginning of last line of a big text file ?

Sebastian Bassi wrote:
> On Thu, Jan 1, 2009 at 2:19 PM, Barak, Ron <Ron.Barak at lsi.com> wrote:
>> I have a very big text file: I need to find the place where the last
>> line begins (namely, the offset of the one-before-the-last '\n' + 1).
>> Could you suggest a way to do that without getting all the file into
>> memory (as I said, it's a big file), or heaving to readline() all lines (ditto) ?
>
> for line in open(filename):
>     lastline = line
> print "the lastline is: %s",%lastline
>
> This will read all the lines, but line by line, so you will never have
> the whole file in memory.
> There may be more eficient ways to do this, like using the itertools.

I think the OP wanted to do it without having to touch each line in the file.  The following should do the trick, returning both the offset in the file, and that last line's content.

   from os import stat
   def last_line(fname, estimated_line_size=1024):
     assert estimated_line_size > 0
     file_size = stat(fname).st_size
     if not file_size: return 0, ""
     f = file(fname, 'rb')
     f.seek(-1, 2) # grab the last character
     if f.read(1) == '\n': # a "proper" text file
       file_size -= 1
     offset = file_size
     content = ""
     while offset >= 0 and '\n' not in content:
       offset -= estimated_line_size
       if offset < 0:
         estimated_line_size += offset # back it off
         offset = 0
       f.seek(offset)
       block = f.read(estimated_line_size)
       content = block + content
     f.close()
     loc = content.rfind('\n') + 1 # after the newline
     return offset + loc, content[loc:]
   offset, line = last_line('some_file.txt')
   print "[%r] was found at offset %i" % (line, offset)

In theory, it should even handle "malformed" text-files that don't end in a newline.  There might be some odd edge-cases that I missed, but I think I caught most of them.

-tkc








More information about the Python-list mailing list