How to read from a file to an arbitrary delimiter efficiently?
Marko Rauhamaa
marko at pacujo.net
Sat Feb 27 12:47:34 EST 2016
Dennis Lee Bieber <wlfraed at ix.netcom.com>:
> On Sat, 27 Feb 2016 21:40:17 +1100, Steven D'Aprano <steve at pearwood.info>
> declaimed the following:
>>Thanks for finding the issue, but the solutions given don't suit my
>>use case. I don't want an iterator that operates on pre-read blocks, I
>>want something that will read a record from a file, and leave the file
>>pointer one entry past the end of the record.
>>
>>Oh, and records are likely fairly short, but there may be a lot of them.
>
> Considering that most of the world has settled on the view that
> files are just linear streams (curse you, UNIX) anything working with
> "records" has to build the concept on top of the stream. Either by
> making records "fixed width" (allowing for fast random access:
> recNum*recLen => seek position), though likely giving up the stream
> access... Or by wrapping the stream with something that does
> parsing/buffering.
It may be instructive to see how the Linux/UNIX utility head(1)
operates. It actually reads its input greedily but once it has seen
enough, it uses lseek(2) to move the seek position back.
Not all file-like objects can seek so head(1) may fail to operate as
advertised:
========================================================================
$ seq 10000 >/tmp/data.txt
$ {
> head -n 5 >/dev/null
> head -n 5
> } </tmp/data.txt
6
7
8
9
10
$ cat /tmp/data.txt | {
> head -n 5 >/dev/null
> head -n 5
> }
1861
1862
1863
1864
$
========================================================================
Marko
More information about the Python-list
mailing list