a question about Chinese characters in a Python Program

Paul Boddie paul at boddie.org.uk
Mon Oct 20 06:47:48 EDT 2008


On 20 Okt, 07:32, est <electronix... at gmail.com> wrote:
>
> Personally I call it a serious bug in python

Normally I'd entertain the possibility of bugs in Python, but your
reasoning is a bit thin (in http://bugs.python.org/issue3648): "Why
cann't Python just define ascii to range(256)"

I do accept that it can be awkward to output text to the console, for
example, but you have to consider that the console might not be
configured to display any character you can throw at it. My console is
configured for ISO-8859-15 (something like your magical "ascii to
range(256)" only where someone has to decide what those 256 characters
actually are), but that isn't going to help me display CJK characters.
A solution might be to generate UTF-8 and then get the user to display
the output in an appropriately configured application, but even then
someone has to say that it's UTF-8 and not some other encoding that's
being used. As discussed in another recent thread, Python 2.x does
make some reasonable guesses about such matters to the extent that
it's possible automatically (without magical knowledge).

There is also the problem about use of the "str" built-in function or
any operation where some Unicode object may be converted to a plain
string. It is now recommended that you only convert to plain strings
when you need to produce a sequence of bytes (for output, for
example), and that you indicate how the Unicode values are encoded as
bytes (by specifying an encoding). Python 3.x doesn't really change
this: it just makes the Unicode/text vs. bytes distinction more
obvious.

Paul



More information about the Python-list mailing list