how to read the last line of a huge file???
kushal.kumaran+python at gmail.com
Tue Feb 1 05:58:05 CET 2011
On Tue, Feb 1, 2011 at 9:12 AM, Alan Meyer <ameyer2 at yahoo.com> wrote:
> On 01/26/2011 04:22 PM, MRAB wrote:
>> On 26/01/2011 10:59, Xavier Heruacles wrote:
>>> I have do some log processing which is usually huge. The length of each
>>> line is variable. How can I get the last line?? Don't tell me to use
>>> readlines or something like linecache...
>> Seek to somewhere near the end and then read use readlines(). If you
>> get fewer than 2 lines then you can't be sure that you have the entire
>> last line, so seek a little farther from the end and try again.
> I think this has got to be the most efficient solution.
> You might get the source code for the open source UNIX utility "tail" and
> see how they do it. It seems to work with equal speed no matter how large
> the file is and I suspect it uses MRAB's solution, but because it's written
> in C, it probably examines each character directly rather than calling a
> library routine like readlines.
How about mmapping the file and using rfind?
with open(filename) as f:
mapping = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
endIdx = mapping.rfind('\n')
startIdx = mapping.rfind('\n', 0, endIdx)
return mapping[startIdx + 1:endIdx]
offset = -10
with open(filename, 'rb') as f:
lines = f.readlines()
if len(lines) >= 2:
offset *= 2
In : import timeit
In : timeit.timeit('finders.seeker("the-file")', 'import finders')
In : timeit.timeit('finders.mapper("the-file")', 'import finders')
the-file is a 120M file with ~500k lines. Both functions assume the
last line has a trailing newline. It's easy to correct if that's not
the case. I think mmap works similarly on Windows, but I've never
More information about the Python-list