[Python-3000] Draft PEP for New IO system

Jim Jewett jimjjewett at gmail.com
Tue Feb 27 19:41:46 CET 2007


On 2/27/07, Adam Olsen <rhamph at gmail.com> wrote:
> On 2/26/07, Mike Verdone <mike.verdone at gmail.com> wrote:
> > Text I/O
> > ... operate on a per-character basis instead of a per-byte basis.

> "per-character" needs some clarification.  I'm guessing this will only
> return entire code points, but the unicode type will expose them as
> code units, so it could be seen as both per-code-point and
> per-code-unit.

Does this just mean that you assume
(1) UTF32
(2) surrogate pairs will show up as two characters
(3) diacritics may (or may not) show up separately from their base characters?

This does suggest that error-correction should be specified (or at
least explicitly not specified).  If the underlying input byte-stream
contains an invalid sequence, will the TextIO raise a
UnicodeDecodeError?  Or will its error/replace/delete behavior be
settable?

Does the Text class promise to catch things like an invalid
combination of surrogates?

-jJ


More information about the Python-3000 mailing list