Unicode question
Gerhard Häring
gh at ghaering.de
Thu Jul 17 20:07:13 EDT 2003
Thomas Heller wrote:
> Gerhard Häring <gh at ghaering.de> writes:
>
>
>> >>> u"äöü"
>>u'\x84\x94\x81'
>>
>>(Python 2.2.3/2.3b2; sys.getdefaultencoding() == "ascii")
>>
>>Why does this work?
>>
>>Does Python guess which encoding I mean? I thought Python should
>>refuse to guess :-)
>
>
> I stumbled over this yesterday, and it seems it is (at least) partially
> answered by PEP 263:
>
> In Python 2.1, Unicode literals can only be written using the
> Latin-1 based encoding "unicode-escape". This makes the programming
> environment rather unfriendly to Python users who live and work in
> non-Latin-1 locales such as many of the Asian countries. Programmers
> can write their 8-bit strings using the favorite encoding, but are
> bound to the "unicode-escape" encoding for Unicode literals.
>
> I have the impression that this is undocumented on purpose, because you
> should not write unescaped non-ansi characters into the source file
> (with 'unknown' encoding).
I agree that using latin1 as default is bad. If there's an encoding
cookie in the 2.3+ source file then this encoding could be used.
I stumbled on this when giving another Python user on this list a
pointer to the relevant section in the Python tutorial
(http://www.python.org/doc/current/tut/node5.html#SECTION005130000000000000000)
where Guido uses u"äöü" in an example.
As this is BAD the tutorial should probably be changed. I'll file a bug
report.
-- Gerhard
More information about the Python-list
mailing list