[Python-Dev] PEP 393 Summer of Code Project

"Martin v. Löwis" martin at v.loewis.de
Thu Aug 25 09:50:08 CEST 2011


> What about things like the surrogateescape codec that
> deliberately use code units in non-standard ways? Will
> tricks like that still be possible if the code-unit
> level is hidden from the programmer?

Most certainly. In the PEP-393 representation, the surrogate
characters can readily be represented (and would imply atleast
the two-byte form), but they will never take their UTF-16
function (i.e. the UTF-8 codec won't try to combine surrogate
pairs), so they can be used for surrogateescape and other
functions. Of course, in strict error mode, codecs will
refuse to encode them (notice that surrogateescape is an error
handler, not a codec).

Regards,
Martin



More information about the Python-Dev mailing list