How to read from a file to an arbitrary delimiter efficiently?
BartC
bc at freeuk.com
Sat Feb 27 11:35:15 EST 2016
On 25/02/2016 06:50, Steven D'Aprano wrote:
> I have a need to read to an arbitrary delimiter, which might be any of a
> (small) set of characters. For the sake of the exercise, lets say it is
> either ! or ? (for example).
>
> # Read a chunk of bytes/characters from an open file.
> def chunkiter(f, delim):
> buffer = []
> b = f.read(1)
> while b:
> buffer.append(b)
> if b in delim:
> yield ''.join(buffer)
> buffer = []
> b = f.read(1)
> if buffer:
> yield ''.join(buffer)
At first sight, it's not surprising it's slow when you throw in
generators and whatnot in there.
However those aren't the main reasons for the poor speed. The limiting
factor here is reading one byte at a time. Just a loop like this:
while f.read(1):
pass
without doing anything else, seems to take most of the time. (3.6
seconds, compared with 5.6 seconds of your readchunks() on a 6MB version
of your test file, on Python 2.7. readlines() took about 0.2 seconds.)
Any faster solutions would need to read more than one byte at a time.
(This bottleneck occurs in C too if you try and do read a file using
only fgetc(), compared with any buffered solutions.)
--
bartc
More information about the Python-list
mailing list