[Python-3000] revamping the io stack, part 2

Sun Apr 30 17:33:02 CEST 2006

and one small thing i forgot to mention --

file.read/write work with the new bytes() type, while
textfile.read/write work with strings (depends on the encoding)


-tomer


On 4/30/06, tomer filiba <tomerfiliba at gmail.com> wrote:
> > I don't
> > want to make the 90% case require hardly any memorizing of what
> > readers I need in what order.
> ...
> > See, this is what I am worried about.  I **really** like not having to
> > figure out what I need to do to read by lines from a file.  If the
> > FileStream object had an __iter__ that did the proper wrapping with
> > LinedBufferedStream, then great, I'm happy.  But if we do not add some
> > reasonable convenience functions or iterators
>
> yes, i totally agree with that: we do need convenience functions.
> take a look at this:
>
> def file(filename, mode = "r", bufsize = None):
>     # open the file
>     f = FileStream(filename, "r")
>     # add buffering if requested
>     if bufsize is not None:
>         f = BufferedStream(f, bufsize)
>     return f
>
> def textfile(filename, *args):
>     # add "text mode"
>     return TextCodec(file(filename))
>
> so today's file() remains in tact, but is accompanied by a textfile()
> counterpart, that opens in textmode. or we could add a "t" mode to
> the file's mode list, but that's ugly.
>
> and the TextCodec adds __iter__ over lines, so you *can* do
> for line in textfile("c:\\blah"):
>     pass
>
> but not
> for line in file("c:\\blah"):
>     pass
>
> because only text files have the notion of lines. i think you'd agree
> that it's meaningless to iterate by *lines* over arbitrary streams, like
> binary files or whatever. so it should be explicit, that you want to
> treat the data as text. and don't forget line translation could corrupt
> your binary data, etc.
>
> > this is going to feel
> > rather heavy-handed rather quickly.
> so i hope it doesn't seem heavy or too complex now.
>
>
> greetings,
> -tomer
>
> On 4/29/06, Brett Cannon <brett at python.org> wrote:
> > On 4/29/06, tomer filiba <tomerfiliba at gmail.com> wrote:
> > > i first thought on focusing on the socket module, because it's the part that
> > > bothers me most, but since people have expressed their thoughts on
> > > completely
> > > revamping the IO stack, perhaps we should be open to adopting new ideas,
> > > mainly from the java/.NET world (keeping the momentum from the previous
> > > post).
> > >
> > > there is an inevitable issue of performance here, since it basically splits
> > > what used to be "file" or "socket" into many layers... each adding
> > > additional
> > > overhead, so many parts should be lowered to C.
> > >
> > > if we look at java/.NET for guidance, they have come up with two concepts:
> >
> > I am a little weary of taking too much from Java/.NET since I have
> > always found the I/O system way too heavy for the common case.  I
> > can't remember what it takes to get a reader in Java in order to read
> > by lines.  In Python, I love that I don't have to think about that;
> > just pass a file object to 'for' and I am done.
> >
> > While I am all for allowing for more powerful I/O through stacking a
> > stream within various readers (which feels rather functional to me,
> > but that must just be because of my latest reading material), I don't
> > want to make the 90% case require hardly any memorizing of what
> > readers I need in what order.
> >
> > > * stream - an arbitrary, usually sequential, byte data source
> > > * readers and writers - the way data is encoded into/decoded from the
> > > stream.
> > > we'll use the term "codec" for these readers and writers in general.
> > >
> > > so "stream" is the "where" and "codec" is the "how", and the concept of
> > > codecs is not limited to ASCII vs UTF-8. it can grow into fully-fledged
> > > protocols.
> > [SNIP - a whole lot of detailed ideas]
> > > -----
> > >
> > > buffering is always *explicit* and implemented at the interpreter level,
> > > rather than by libc, so it is consistent between all platforms and streams.
> > > all streams, by nature, and *non-buffered* (write the data as soon as
> > > possible). buffering wraps an underlying stream, making it explicit
> > >
> > > class BufferedStream(Stream):
> > >     def __init__(self, stream, bufsize)
> > >     def flush(self)
> > >
> > > (BufferedStream appears in .NET)
> > >
> > > class LineBufferedStream(BufferedStream):
> > >     def __init__(self, stream, flush_on = b"\n")
> > >
> > > f = LineBufferedStream(FileStream("c:\\blah"))
> > >
> > > where flush_on specifies the byte (or sequence of bytes?) to flush upon
> > > writing. by default it would be on newline.
> > >
> >
> > See, this is what I am worried about.  I **really** like not having to
> > figure out what I need to do to read by lines from a file.  If the
> > FileStream object had an __iter__ that did the proper wrapping with
> > LinedBufferedStream, then great, I'm happy.  But if we do not add some
> > reasonable convenience functions or iterators, this is going to feel
> > rather heavy-handed rather quickly.
> >
> > -Brett
> >
>