Unicode question

Gerhard Häring gh at ghaering.de
Fri Jul 18 02:07:13 CEST 2003

Thomas Heller wrote:
> Gerhard Häring <gh at ghaering.de> writes:
>> >>> u"äöü"
>>(Python 2.2.3/2.3b2; sys.getdefaultencoding() == "ascii")
>>Why does this work?
>>Does Python guess which encoding I mean? I thought Python should
>>refuse to guess :-)
> I stumbled over this yesterday, and it seems it is (at least) partially
> answered by PEP 263:
>     In Python 2.1, Unicode literals can only be written using the
>     Latin-1 based encoding "unicode-escape". This makes the programming
>     environment rather unfriendly to Python users who live and work in
>     non-Latin-1 locales such as many of the Asian countries. Programmers
>     can write their 8-bit strings using the favorite encoding, but are
>     bound to the "unicode-escape" encoding for Unicode literals.
> I have the impression that this is undocumented on purpose, because you
> should not write unescaped non-ansi characters into the source file
> (with 'unknown' encoding).

I agree that using latin1 as default is bad. If there's an encoding 
cookie in the 2.3+ source file then this encoding could be used.

I stumbled on this when giving another Python user on this list a 
pointer to the relevant section in the Python tutorial 
where Guido uses u"äöü" in an example.

As this is BAD the tutorial should probably be changed. I'll file a bug 

-- Gerhard

More information about the Python-list mailing list