How to read from a file to an arbitrary delimiter efficiently?
Chris Angelico
rosuav at gmail.com
Thu Feb 25 02:30:25 EST 2016
On Thu, Feb 25, 2016 at 5:50 PM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
>
> # Read a chunk of bytes/characters from an open file.
> def chunkiter(f, delim):
> buffer = []
> b = f.read(1)
> while b:
> buffer.append(b)
> if b in delim:
> yield ''.join(buffer)
> buffer = []
> b = f.read(1)
> if buffer:
> yield ''.join(buffer)
How bad is it if you over-read? If it's absolutely critical that you
not read anything from the buffer that you shouldn't, then yeah, it's
going to be slow. But if you're never going to read the file using
anything other than this iterator, the best thing to do is to read
more at a time. Simple and naive method:
def chunkiter(f, delim):
"""Don't use [ or ] as the delimiter, kthx"""
buffer = ""
b = f.read(256)
while b:
buffer += b
*parts, buffer = re.split("["+delim+"]", buffer)
yield from parts
if buffer: yield buffer
How well does that perform?
ChrisA
More information about the Python-list
mailing list