How to read from a file to an arbitrary delimiter efficiently?
timothy.c.delaney at gmail.com
Sun Feb 28 16:00:22 EST 2016
On 29 February 2016 at 07:28, Oscar Benjamin <oscar.j.benjamin at gmail.com>
> On 25 February 2016 at 06:50, Steven D'Aprano
> <steve+comp.lang.python at pearwood.info> wrote:
> > I have a need to read to an arbitrary delimiter, which might be any of a
> > (small) set of characters. For the sake of the exercise, lets say it is
> > either ! or ? (for example).
> > I want to read from files reasonably efficiently. I don't mind if there
> is a
> > little overhead, but my first attempt is 100 times slower than the
> > "read to the end of the line" method.
> You can get something much faster using mmap and searching for a
> single delimiter:
> My timing makes that ~7x slower than iterating over the lines of the
> file but still around 100x faster than reading individual characters.
> I'm not sure how to generalise it to looking for multiple delimiters
> without dropping back to reading individual characters though.
You can use an mmapped file as the input for regular expressions. May or
may not be particularly efficient.
Otherwise, if reading from a file I think read a chunk, and seek() back to
the delimiter is probably going to be most efficient whilst leaving the
file position just after the delimiter.
If reading from a stream, I think Chris' read a chunk and maintain an
internal buffer, and don't give access to the underlying stream.
More information about the Python-list