How to read from a file to an arbitrary delimiter efficiently?
Chris Angelico
rosuav at gmail.com
Sat Feb 27 07:17:36 EST 2016
On Sat, Feb 27, 2016 at 8:49 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> On Thu, 25 Feb 2016 06:30 pm, Chris Angelico wrote:
>
>> On Thu, Feb 25, 2016 at 5:50 PM, Steven D'Aprano
>> <steve+comp.lang.python at pearwood.info> wrote:
>>>
>>> # Read a chunk of bytes/characters from an open file.
>>> def chunkiter(f, delim):
>>> buffer = []
>>> b = f.read(1)
>>> while b:
>>> buffer.append(b)
>>> if b in delim:
>>> yield ''.join(buffer)
>>> buffer = []
>>> b = f.read(1)
>>> if buffer:
>>> yield ''.join(buffer)
>>
>> How bad is it if you over-read?
>
> Pretty bad :-)
>
> Ideally, I'd rather not over-read at all. I'd like the user to be able to
> swap from "read N bytes" to "read to the next delimiter" (and possibly
> even "read the next line") without losing anything.
If those are the *only* two operations, you should be able to maintain
your own buffer. Something like this:
class ChunkIter:
def __init__(self, f, delim):
self.f = f
self.delim = re.compile("["+delim+"]")
self.buffer = ""
def read_to_delim(self):
"""Return characters up to the next delim, or remaining chars,
or "" if at EOF"""
while "delimiter not found":
*parts, self.buffer = self.delim.split(self.buffer, 1)
if parts: return parts[0]
b = self.f.read(256)
if not b: return self.buffer
self.buffer += b
def read(self, nbytes):
need = nbytes - len(self.buffer)
if need > 0: self.buffer += self.f.read(need)
ret, self.buffer = self.buffer[:need], self.buffer[need:]
return ret
It still might over-read from the underlying file, but those extra
chars will be available to the read(N) function.
ChrisA
More information about the Python-list
mailing list