[Python-3000] On PEP 3116: new I/O base classes

Thu Jun 21 02:33:45 CEST 2007

Daniel Stutzbach wrote:
> On 6/20/07, Bill Janssen <janssen at parc.com> wrote:
> > > Ah, not everyone dealing with text is dealing with line-delimited
> > > text, you know...
> >
> > It's really the only difference between text and non-text.
> 
> Text is a sequence of characters.  Non-text is a sequence of bytes.
> Characters may be multi-byte.  It is no longer an ASCII world.

Yes, of course, Daniel, but I was speaking of the contents of files,
and files are inherently sequences of bytes.  If we are talking about
some layer which interprets the contents of a file, just saying "give
me N characters" isn't enough.  We need to say, "N characters assuming
a text encoding of M, with a normalization policy of Q, and a newline
policy of R".  If we don't, we can't just "read" N characters safely.
So I think it's broken to put this in the TextIOBase class; instead,
there should be some wrapper class that does buffering and can be
configured as to (M, Q, R).

Bill