[Python-3000] Comment on iostack library

Wed Aug 30 16:22:11 CEST 2006

On 8/29/06, Talin <talin at acm.org> wrote:
> Guido van Rossum wrote:
> > I'm not sure I follow.
> >
> > We *definitely* don't want to use stdio -- it's not part of the OS
> > anyway, and has some annoying quirks like not giving you any insight
> > in how it is using the buffer, nor changing the buffer size on the
> > fly, and crashing when you switch read and write calls.
> >
> > So given that, how would you implement readline()? Reading one byte at
> > a time until you've got the \n is definitely way too slow given the
> > constant overhead of system calls.
> >
> > Regarding optimal buffer size, I've never seen a program for which 8K
> > wasn't optimal. Larger buffers simply don't pay off.
>
> Well, as far as readline goes: In order to split the text into lines,
> you have to decode the text first anyway, which is a layer 3 operation.

OK, I see some of your point. This may explain why in Java the
buffering layer seems to be sitting on top of the encoding/decoding.

Still, for binary file I/O, we'll need a buffering layer on top of the
raw I/O operations. Lots of file formats are read/written in small
chunks but it would be very expensive to turn each small chunk into a
system call.

> As far as stdio not giving you hints as to how it is using the buffer, I
> am not sure what you mean...what kind of information would a custom
> buffer implementation give you that stdio would not?

The specific problem with stdio is that you can't tell if anything is
in the buffer or not. This can make it difficult to do non-blocking
I/O on a socket through stdio (e.g. when using the makefile() option
of Python sockets). Another is that a read after a write is undefined
in the C std and can give segfaults on some platforms, so Python has
to keep track of the "state" of the I/O buffer.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)