[Python-Dev] PEP 263 in the works (Non-ASCII characters in test_pep277.py in 2.3)
Guido van Rossum
guido@python.org
Mon, 07 Oct 2002 13:31:03 -0400
> Now that I can edit UTF-8 directly, I find a "feature" made
> possible by the PEP 263 support of Python 2.3 rather
> puzzling:
>
> Let's say I edit a file testencoding.py in XEmacs with UTF-8
> support:
(Note that I'm viewing this as Latin-1. The comment, s and u in the
source are all three the same: a-umlaut, o-umlaut, u-umlaut.)
> # -*- coding: utf-8; -*-
> # comment äöü
> s = "äöü"
> u = u"äöü"
> print s
> print u.encode('latin-1')
> print 'works !'
>
> With Python 2.3 this prints:
>
> äöü
> äöü
> works !
>
> I would have expected that s turns out as "äöü" using print,
> since that's how I wrote it in the source file.
No, because stdout isn't assumed to be UTF-8. The string s is your
string encoded in UTF-8, and those are the bytes written by print.
> This suggests to me that mixing string and Unicode literals
> using non-ASCII characters in a single file should probably
> be avoided.
Or it suggests that we need a way to deal with encodings on stdout
more gently.
--Guido van Rossum (home page: http://www.python.org/~guido/)