tail
Marco Sulla
Marco.Sulla.Python at gmail.com
Sun May 8 12:05:11 EDT 2022
I think I've _almost_ found a simpler, general way:
import os
_lf = "\n"
_cr = "\r"
def tail(filepath, n=10, newline=None, encoding=None, chunk_size=100):
n_chunk_size = n * chunk_size
pos = os.stat(filepath).st_size
chunk_line_pos = -1
lines_not_found = n
with open(filepath, newline=newline, encoding=encoding) as f:
text = ""
hard_mode = False
if newline == None:
newline = _lf
elif newline == "":
hard_mode = True
if hard_mode:
while pos != 0:
pos -= n_chunk_size
if pos < 0:
pos = 0
f.seek(pos)
text = f.read()
lf_after = False
for i, char in enumerate(reversed(text)):
if char == _lf:
lf_after == True
elif char == _cr:
lines_not_found -= 1
newline_size = 2 if lf_after else 1
lf_after = False
elif lf_after:
lines_not_found -= 1
newline_size = 1
lf_after = False
if lines_not_found == 0:
chunk_line_pos = len(text) - 1 - i + newline_size
break
if lines_not_found == 0:
break
else:
while pos != 0:
pos -= n_chunk_size
if pos < 0:
pos = 0
f.seek(pos)
text = f.read()
for i, char in enumerate(reversed(text)):
if char == newline:
lines_not_found -= 1
if lines_not_found == 0:
chunk_line_pos = len(text) - 1 - i +
len(newline)
break
if lines_not_found == 0:
break
if chunk_line_pos == -1:
chunk_line_pos = 0
return text[chunk_line_pos:]
Shortly, the file is always opened in text mode. File is read at the end in
bigger and bigger chunks, until the file is finished or all the lines are
found.
Why? Because in encodings that have more than 1 byte per character, reading
a chunk of n bytes, then reading the previous chunk, can eventually split
the character between the chunks in two distinct bytes.
I think one can read chunk by chunk and test the chunk junction problem. I
suppose the code will be faster this way. Anyway, it seems that this trick
is quite fast anyway and it's a lot simpler.
The final result is read from the chunk, and not from the file, so there's
no problems of misalignment of bytes and text. Furthermore, the builtin
encoding parameter is used, so this should work with all the encodings
(untested).
Furthermore, a newline parameter can be specified, as in open(). If it's
equal to the empty string, the things are a little more complicated, anyway
I suppose the code is clear. It's untested too. I only tested with an utf8
linux file.
Do you think there are chances to get this function as a method of the file
object in CPython? The method for a file object opened in bytes mode is
simpler, since there's no encoding and newline is only \n in that case.
More information about the Python-list
mailing list