tail
Marco Sulla
Marco.Sulla.Python at gmail.com
Wed May 11 15:58:13 EDT 2022
On Mon, 9 May 2022 at 23:15, Dennis Lee Bieber <wlfraed at ix.netcom.com>
wrote:
>
> On Mon, 9 May 2022 21:11:23 +0200, Marco Sulla
> <Marco.Sulla.Python at gmail.com> declaimed the following:
>
> >Nevertheless, tail is a fundamental tool in *nix. It's fast and
> >reliable. Also the tail command can't handle different encodings?
>
> Based upon
> https://github.com/coreutils/coreutils/blob/master/src/tail.c the ONLY
> thing tail looks at is single byte "\n". It does not handle other line
> endings, and appears to performs BINARY I/O, not text I/O. It does nothing
> for bytes that are not "\n". Split multi-byte encodings are irrelevant
> since, if it does not find enough "\n" bytes in the buffer (chunk) it
reads
> another binary chunk and seeks for additional "\n" bytes. Once it finds
the
> desired amount, it is synchronized on the byte following the "\n" (which,
> for multi-byte encodings might be a NUL, but in any event, should be a
safe
> location for subsequent I/O).
>
> Interpretation of encoding appears to fall to the console driver
> configuration when displaying the bytes output by tail.
Ok, I understand. This should be a Python implementation of *nix tail:
import os
_lf = b"\n"
_err_n = "Parameter n must be a positive integer number"
_err_chunk_size = "Parameter chunk_size must be a positive integer number"
def tail(filepath, n=10, chunk_size=100):
if (n <= 0):
raise ValueError(_err_n)
if (n % 1 != 0):
raise ValueError(_err_n)
if (chunk_size <= 0):
raise ValueError(_err_chunk_size)
if (chunk_size % 1 != 0):
raise ValueError(_err_chunk_size)
n_chunk_size = n * chunk_size
pos = os.stat(filepath).st_size
chunk_line_pos = -1
lines_not_found = n
with open(filepath, "rb") as f:
text = bytearray()
while pos != 0:
pos -= n_chunk_size
if pos < 0:
pos = 0
f.seek(pos)
chars = f.read(n_chunk_size)
text[0:0] = chars
search_pos = n_chunk_size
while search_pos != -1:
chunk_line_pos = chars.rfind(_lf, 0, search_pos)
if chunk_line_pos != -1:
lines_not_found -= 1
if lines_not_found == 0:
break
search_pos = chunk_line_pos
if lines_not_found == 0:
break
return bytes(text[chunk_line_pos+1:])
The function opens the file in binary mode and searches only for b"\n". It
returns the last n lines of the file as bytes.
I suppose this function is fast. It reads the bytes from the file in chunks
and stores them in a bytearray, prepending them to it. The final result is
read from the bytearray and converted to bytes (to be consistent with the
read method).
I suppose the function is reliable. File is opened in binary mode and only
b"\n" is searched as line end, as *nix tail (and python readline in binary
mode) do. And bytes are returned. The caller can use them as is or convert
them to a string using the encoding it wants, or do whatever its
imagination can think :)
Finally, it seems to me the function is quite simple.
If all my affirmations are true, the three obstacles written by Chris
should be passed.
I'd very much like to see a CPython implementation of that function. It
could be a method of a file object opened in binary mode, and *only* in
binary mode.
What do you think about it?
More information about the Python-list
mailing list