tail
MRAB
python at mrabarnett.plus.com
Fri May 6 16:19:48 EDT 2022
On 2022-05-06 20:21, Marco Sulla wrote:
> I have a little problem.
>
> I tried to extend the tail function, so it can read lines from the bottom
> of a file object opened in text mode.
>
> The problem is it does not work. It gets a starting position that is lower
> than the expected by 3 characters. So the first line is read only for 2
> chars, and the last line is missing.
>
> import os
>
> _lf = "\n"
> _cr = "\r"
> _lf_ord = ord(_lf)
>
> def tail(f, n=10, chunk_size=100):
> n_chunk_size = n * chunk_size
> pos = os.stat(f.fileno()).st_size
> chunk_line_pos = -1
> lines_not_found = n
> binary_mode = "b" in f.mode
> lf = _lf_ord if binary_mode else _lf
>
> while pos != 0:
> pos -= n_chunk_size
>
> if pos < 0:
> pos = 0
>
> f.seek(pos)
> chars = f.read(n_chunk_size)
>
> for i, char in enumerate(reversed(chars)):
> if char == lf:
> lines_not_found -= 1
>
> if lines_not_found == 0:
> chunk_line_pos = len(chars) - i - 1
> print(chunk_line_pos, i)
> break
>
> if lines_not_found == 0:
> break
>
> line_pos = pos + chunk_line_pos + 1
>
> f.seek(line_pos)
>
> res = b"" if binary_mode else ""
>
> for i in range(n):
> res += f.readline()
>
> return res
>
> Maybe the problem is 1 char != 1 byte?
Is the file UTF-8? That's a variable-width encoding, so are any of the
characters > U+007F?
Which OS? On Windows, it's common/normal for UTF-8 files to start with a
BOM/signature, which is 3 bytes/1 codepoint.
More information about the Python-list
mailing list