tail
Cameron Simpson
cs at cskk.id.au
Sat Apr 23 18:09:44 EDT 2022
On 24Apr2022 07:15, Chris Angelico <rosuav at gmail.com> wrote:
>On Sun, 24 Apr 2022 at 07:13, Marco Sulla <Marco.Sulla.Python at gmail.com> wrote:
>> Emh, why chunks? My function simply reads byte per byte and compares
>> it to b"\n". When it find it, it stops and do a readline():
[...]
>> This is only for one line and in utf8, but it can be generalised.
For some encodings that generalisation might be hard. But mostly, yes.
>Ah. Well, then, THAT is why it's inefficient: you're seeking back one
>single byte at a time, then reading forwards. That is NOT going to
>play nicely with file systems or buffers.
An approach I think you both may have missed: mmap the file and use
mmap.rfind(b'\n') to locate line delimiters.
https://docs.python.org/3/library/mmap.html#mmap.mmap.rfind
Avoids sucking the whole file into memory in the usualy sense, instead
the file is paged in as needed. Far more efficient that a seek/read
single byte approach.
If the file's growing you can do this to start with, then do a normal
file open from your end point to follow accruing text. (Or reuse the
descriptor you sues for the mmap, but using s.read().)
Cheers,
Cameron Simpson <cs at cskk.id.au>
More information about the Python-list
mailing list