how to read the last line of a huge file???
Kushal Kumaran
kushal.kumaran+python at gmail.com
Mon Jan 31 23:58:05 EST 2011
On Tue, Feb 1, 2011 at 9:12 AM, Alan Meyer <ameyer2 at yahoo.com> wrote:
> On 01/26/2011 04:22 PM, MRAB wrote:
>>
>> On 26/01/2011 10:59, Xavier Heruacles wrote:
>>>
>>> I have do some log processing which is usually huge. The length of each
>>> line is variable. How can I get the last line?? Don't tell me to use
>>> readlines or something like linecache...
>>>
>> Seek to somewhere near the end and then read use readlines(). If you
>> get fewer than 2 lines then you can't be sure that you have the entire
>> last line, so seek a little farther from the end and try again.
>
> I think this has got to be the most efficient solution.
>
> You might get the source code for the open source UNIX utility "tail" and
> see how they do it. It seems to work with equal speed no matter how large
> the file is and I suspect it uses MRAB's solution, but because it's written
> in C, it probably examines each character directly rather than calling a
> library routine like readlines.
>
How about mmapping the file and using rfind?
def mapper(filename):
with open(filename) as f:
mapping = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
endIdx = mapping.rfind('\n')
startIdx = mapping.rfind('\n', 0, endIdx)
return mapping[startIdx + 1:endIdx]
def seeker(filename):
offset = -10
with open(filename, 'rb') as f:
while True:
f.seek(offset, os.SEEK_END)
lines = f.readlines()
if len(lines) >= 2:
return lines[-1][:-1]
offset *= 2
In [1]: import timeit
In [2]: timeit.timeit('finders.seeker("the-file")', 'import finders')
Out[2]: 32.216405868530273
In [3]: timeit.timeit('finders.mapper("the-file")', 'import finders')
Out[3]: 16.805877208709717
the-file is a 120M file with ~500k lines. Both functions assume the
last line has a trailing newline. It's easy to correct if that's not
the case. I think mmap works similarly on Windows, but I've never
tried there.
--
regards,
kushal
More information about the Python-list
mailing list