[Python-3000] Draft PEP for New IO system
Guido van Rossum
guido at python.org
Tue Feb 27 20:02:20 CET 2007
The encoding/decoding behavior should be no different from that of the
encode() and decode() methods on unicode strings and byte arrays.
Certainly no normalization of diacritics will be done; surrogate
handling depends on the encoding and whether the unicode string
implementation uses 16 or 32 bits per character.
I agree that we need to be able to specify the error handling as well.
UnicodeErrors may be raised.
On 2/27/07, Jim Jewett <jimjjewett at gmail.com> wrote:
> On 2/27/07, Adam Olsen <rhamph at gmail.com> wrote:
> > On 2/26/07, Mike Verdone <mike.verdone at gmail.com> wrote:
> > > Text I/O
> > > ... operate on a per-character basis instead of a per-byte basis.
> > "per-character" needs some clarification. I'm guessing this will only
> > return entire code points, but the unicode type will expose them as
> > code units, so it could be seen as both per-code-point and
> > per-code-unit.
> Does this just mean that you assume
> (1) UTF32
> (2) surrogate pairs will show up as two characters
> (3) diacritics may (or may not) show up separately from their base characters?
> This does suggest that error-correction should be specified (or at
> least explicitly not specified). If the underlying input byte-stream
> contains an invalid sequence, will the TextIO raise a
> UnicodeDecodeError? Or will its error/replace/delete behavior be
> Does the Text class promise to catch things like an invalid
> combination of surrogates?
> Python-3000 mailing list
> Python-3000 at python.org
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-3000