[Python-3000] Unicode and OS strings

Marcin 'Qrczak' Kowalczyk qrczak at knm.org.pl
Tue Sep 18 01:06:54 CEST 2007


Dnia 16-09-2007, N o godzinie 16:13 +0900, Stephen J. Turnbull
napisał(a):

> When a codec encounters something it can't handle, whether it's a
> valid character in a legacy encoding, a private use character in a
> UTF, or an invalid sequence of code units, it throws an exception
> specifying the character or code unit and the current coded character
> set,

Does this mean that this:
$ python -c 'import sys; print("%x" % ord(sys.argv[1]))' $(printf "\ue650")
would no longer print e650 in a UTF-8 locale, assuming a shell which
understands the escape sequence in printf, and the script would have
to make special arrangements to make the character available? U+E650
is a private use character.

If so, I'm violently against this.

> This definitely requires that the Unicode codecs be modified to do the
> right thing if they encounter private use characters in the input
> stream or output stream.

The right thing is to encode or decode private use characters according
to regular codec rules, as all other transcoders of these codecs in all
other languages do.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/



More information about the Python-3000 mailing list