[Python-Dev] PEP 263 in the works (Non-ASCII characters in test_pep277.py in 2.3)

M.-A. Lemburg mal@lemburg.com
Mon, 07 Oct 2002 20:19:26 +0200


Guido van Rossum wrote:
>>Now that I can edit UTF-8 directly, I find a "feature" made
>>possible by the PEP 263 support of Python 2.3 rather
>>puzzling:
>>
>>Let's say I edit a file testencoding.py in XEmacs with UTF-8
>>support:
>=20
>=20
> (Note that I'm viewing this as Latin-1.  The comment, s and u in the
> source are all three the same: a-umlaut, o-umlaut, u-umlaut.)

Oh, yes, forgot to mention that.

>># -*- coding: utf-8; -*-
>># comment =E4=F6=FC
>>s =3D "=E4=F6=FC"
>>u =3D u"=E4=F6=FC"
>>print s
>>print u.encode('latin-1')
>>print 'works !'
>>
>>With Python 2.3 this prints:
>>
>>=C3=A4=C3=B6=C3=BC
>>=E4=F6=FC
>>works !
>>
>>I would have expected that s turns out as "=E4=F6=FC" using print,
>>since that's how I wrote it in the source file.
>=20
>=20
> No, because stdout isn't assumed to be UTF-8.  The string s is your
> string encoded in UTF-8, and those are the bytes written by print.

Uhm, I would have *expected* that behaviour -- using the left side
of my head. The right side knows why this technically doesn't work
out of the box :-)

>>This suggests to me that mixing string and Unicode literals
>>using non-ASCII characters in a single file should probably
>>be avoided.
>=20
> Or it suggests that we need a way to deal with encodings on stdout
> more gently.

--=20
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/