[Python-3000] revamping the io stack, part 2

Brett Cannon brett at python.org
Sat Apr 29 22:50:39 CEST 2006


On 4/29/06, tomer filiba <tomerfiliba at gmail.com> wrote:
> i first thought on focusing on the socket module, because it's the part that
> bothers me most, but since people have expressed their thoughts on
> completely
> revamping the IO stack, perhaps we should be open to adopting new ideas,
> mainly from the java/.NET world (keeping the momentum from the previous
> post).
>
> there is an inevitable issue of performance here, since it basically splits
> what used to be "file" or "socket" into many layers... each adding
> additional
> overhead, so many parts should be lowered to C.
>
> if we look at java/.NET for guidance, they have come up with two concepts:

I am a little weary of taking too much from Java/.NET since I have
always found the I/O system way too heavy for the common case.  I
can't remember what it takes to get a reader in Java in order to read
by lines.  In Python, I love that I don't have to think about that;
just pass a file object to 'for' and I am done.

While I am all for allowing for more powerful I/O through stacking a
stream within various readers (which feels rather functional to me,
but that must just be because of my latest reading material), I don't
want to make the 90% case require hardly any memorizing of what
readers I need in what order.

> * stream - an arbitrary, usually sequential, byte data source
> * readers and writers - the way data is encoded into/decoded from the
> stream.
> we'll use the term "codec" for these readers and writers in general.
>
> so "stream" is the "where" and "codec" is the "how", and the concept of
> codecs is not limited to ASCII vs UTF-8. it can grow into fully-fledged
> protocols.
[SNIP - a whole lot of detailed ideas]
> -----
>
> buffering is always *explicit* and implemented at the interpreter level,
> rather than by libc, so it is consistent between all platforms and streams.
> all streams, by nature, and *non-buffered* (write the data as soon as
> possible). buffering wraps an underlying stream, making it explicit
>
> class BufferedStream(Stream):
>     def __init__(self, stream, bufsize)
>     def flush(self)
>
> (BufferedStream appears in .NET)
>
> class LineBufferedStream(BufferedStream):
>     def __init__(self, stream, flush_on = b"\n")
>
> f = LineBufferedStream(FileStream("c:\\blah"))
>
> where flush_on specifies the byte (or sequence of bytes?) to flush upon
> writing. by default it would be on newline.
>

See, this is what I am worried about.  I **really** like not having to
figure out what I need to do to read by lines from a file.  If the
FileStream object had an __iter__ that did the proper wrapping with
LinedBufferedStream, then great, I'm happy.  But if we do not add some
reasonable convenience functions or iterators, this is going to feel
rather heavy-handed rather quickly.

-Brett


More information about the Python-3000 mailing list