[Python-3000] Draft PEP for New IO system

Tue Feb 27 20:02:20 CET 2007

The encoding/decoding behavior should be no different from that of the
encode() and decode() methods on unicode strings and byte arrays.

Certainly no normalization of diacritics will be done; surrogate
handling depends on the encoding and whether the unicode string
implementation uses 16 or 32 bits per character.

I agree that we need to be able to specify the error handling as well.
UnicodeErrors may be raised.

--Guido

On 2/27/07, Jim Jewett <jimjjewett at gmail.com> wrote:
> On 2/27/07, Adam Olsen <rhamph at gmail.com> wrote:
> > On 2/26/07, Mike Verdone <mike.verdone at gmail.com> wrote:
> > > Text I/O
> > > ... operate on a per-character basis instead of a per-byte basis.
>
> > "per-character" needs some clarification.  I'm guessing this will only
> > return entire code points, but the unicode type will expose them as
> > code units, so it could be seen as both per-code-point and
> > per-code-unit.
>
> Does this just mean that you assume
> (1) UTF32
> (2) surrogate pairs will show up as two characters
> (3) diacritics may (or may not) show up separately from their base characters?
>
> This does suggest that error-correction should be specified (or at
> least explicitly not specified).  If the underlying input byte-stream
> contains an invalid sequence, will the TextIO raise a
> UnicodeDecodeError?  Or will its error/replace/delete behavior be
> settable?
>
> Does the Text class promise to catch things like an invalid
> combination of surrogates?
>
> -jJ
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)