Lazy "for line in f" ?
alexandre.ferrieux at gmail.com
Mon Jul 23 08:56:18 CEST 2007
On Jul 23, 1:03 am, Steve Holden <st... at holdenweb.com> wrote:
> What makes you think Python doesn't use the platform fgets()?
The fact that it does that extra layer of buffering. Stdio is already
buffered, duplicating this is useless.
> ... in the case of file.next() (the file method called to
> iterate over the contents) it will actually use getc_unlocked() on
> platforms that offer it, though you can override that configuration
> feature by setting USE_FGETS_IN_GETLINE
Does nothing. And anyway, stdio's getc() does not stubbornly block on
So switching from getc to gets seems orthogonal to the problem.
> It's probably more to do with the buffering. If whatever is driving the
> file is using buffering itself, then it really doesn't matter what the
> Python library does, it will still have to wait until the sending buffer
> fills before it can get any data at all.
Nonsense. In all three cases of pipe, socket, terminal, I control the
writer and make sure that it writes in unbuffered manner. To convince
you, here is an strace of the Python process while I type random lines
read(0, "sdfsdf\n", 8192) = 7
read(0, "sdfds\n", 7168) = 6
which proves that the Python process actually gets the lines one by
one, but buffers them internally... for much too long. Sigh.
> Try running stdin unbuffered (use python -u) and see if that makes any
> difference. It should, in the shell-driven case, for example.
No effect. As a matter of fact, -u is documented as affecting only
output (stdout and stderr).
So I'll reiterate the question: *why* does the Python library add that
extra layer of (hard-headed) buffering on top of stdio's ?
More information about the Python-list