tail

MRAB python at mrabarnett.plus.com
Fri May 6 16:19:48 EDT 2022


On 2022-05-06 20:21, Marco Sulla wrote:
> I have a little problem.
> 
> I tried to extend the tail function, so it can read lines from the bottom
> of a file object opened in text mode.
> 
> The problem is it does not work. It gets a starting position that is lower
> than the expected by 3 characters. So the first line is read only for 2
> chars, and the last line is missing.
> 
> import os
> 
> _lf = "\n"
> _cr = "\r"
> _lf_ord = ord(_lf)
> 
> def tail(f, n=10, chunk_size=100):
>      n_chunk_size = n * chunk_size
>      pos = os.stat(f.fileno()).st_size
>      chunk_line_pos = -1
>      lines_not_found = n
>      binary_mode = "b" in f.mode
>      lf = _lf_ord if binary_mode else _lf
> 
>      while pos != 0:
>          pos -= n_chunk_size
> 
>          if pos < 0:
>              pos = 0
> 
>          f.seek(pos)
>          chars = f.read(n_chunk_size)
> 
>          for i, char in enumerate(reversed(chars)):
>              if char == lf:
>                  lines_not_found -= 1
> 
>                  if lines_not_found == 0:
>                      chunk_line_pos = len(chars) - i - 1
>                      print(chunk_line_pos, i)
>                      break
> 
>          if lines_not_found == 0:
>              break
> 
>      line_pos = pos + chunk_line_pos + 1
> 
>      f.seek(line_pos)
> 
>      res = b"" if binary_mode else ""
> 
>      for i in range(n):
>          res += f.readline()
> 
>      return res
> 
> Maybe the problem is 1 char != 1 byte?

Is the file UTF-8? That's a variable-width encoding, so are any of the 
characters > U+007F?

Which OS? On Windows, it's common/normal for UTF-8 files to start with a 
BOM/signature, which is 3 bytes/1 codepoint.


More information about the Python-list mailing list