[Python-3000] On PEP 3116: new I/O base classes

Bill Janssen janssen at parc.com
Wed Jun 20 19:03:49 CEST 2007


> > TextIOBase: this seems an odd mix of high-level and low-level.  I'd
> > remove "seek", "tell", "read", and "write".  Remember that in Python,
> > mixins actually work, so that you can provide a file object that
> > combines several different I/O classes.
> 
> Huh? All those operations you want to remove are entirely necessary  
> for a number of applications. I'm not sure what you meant about mixins?

I meant that TextIOBase should just provide the operations for text.
The other operations would be supported, when appropriate, by mixing
in an appropriate class that provides them.  Remember that this is
a PEP about base classes.

> It doesn't work? Why not? Of course read() should take the number of  
> characters as a parameter, not number of bytes.

Unfortunately, files contain encodings of characters, and those
encodings may at times be mapped to multiple equivalent strings, at
least with respect to Unicode, the target for Python-3000.  The
standard Unicode support for Python-3000 seems to be settling on
having code-point representations of those strings exposed to the
application, which means that any specific automatic normalization is
precluded.  So any particular "readchars(1)" operation may validly
return different strings even if operating on the same underlying
file, and may require a different number of read operations to read
the same underlying bytes.  That is, I believe that the string and/or
file operations are not well-specified enough to guarantee that this
won't happen.  This is the same situation we have today, which means
that the only real way to read Unicode strings from a file will be the
same as today, that is, read raw bytes from a file, decode them and
normalize them in some specific way, and then see what string you wind
up with.  You could probably fix this in the PEP by specifying a
specific Unicode normalization to use when returning strings.

> > feel the need.  Stick to just "readline" and "writeline" for text I/O.
> 
> Ah, not everyone dealing with text is dealing with line-delimited  
> text, you know...

It's really the only difference between text and non-text.

Bill




More information about the Python-3000 mailing list