[Python-Dev] Python code.interact() and UTF-8 locale

Victor STINNER victor.stinner-linux at haypocalc.com
Tue Sep 13 15:53:29 CEST 2005


Le mardi 13 septembre 2005 à 17:56 +0900, Hye-Shik Chang a écrit :
> On 9/11/05, Victor STINNER <victor.stinner-linux at haypocalc.com> wrote:
> > Hi,
> > 
> > I found a bug in Python interactive command line (program python alone:
> > looks to be code.interact() function in code.py). With UTF-8 locale, the
> > command << u"é" >> returns << u'\xc3\xa9' >> and not << u'\xE9' >>.
> > Remember: the french e with acute is Unicode 233 (0xE9), encoded \xC3
> > \xA9 in UTF-8.
> 
> Which version of python do you use?  From 2.4, the interactive mode
> respects locale as a source code encoding and it falls back to latin-1
> when decoding fails.
> 
> Python 2.4.1 (#2, Jul 31 2005, 04:45:53)
> [GCC 3.4.2 [FreeBSD] 20040728] on freebsd5
> Type "help", "copyright", "credits" or "license" for more information.
> >>> u"é"
> u'\xe9'

I installed my own Python 2.4 in /opt/python/. I don't know if the right
code.py is loaded, but here is the output :
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
$ ./python2.4 
Python 2.4.1 (#1, Sep 11 2005, 01:37:26) 
[GCC 4.0.2 20050821 (prerelease) (Debian 4.0.1-6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> u"é"
u'\xe9'
>>> import code
>>> code.interact()
Python 2.4.1 (#1, Sep 11 2005, 01:37:26) 
[GCC 4.0.2 20050821 (prerelease) (Debian 4.0.1-6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> u"é"
u'\xc3\xa9'
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Well, that works better :-) For code.interact(), you can read my
attached patch. I don't know if it the best way to fix the but.

But, the following code still bug in Python 2.4 :
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
$ cat python_unicode_eval_bug.py 
#*- coding: UTF-8 -*-
print "One Unicode character: %u" % len(u"é")
print "One Unicode character (using eval) : %u" % eval('len(u"é")')
$ python2.4 python_unicode_eval_bug.py 
One Unicode character: 1
One Unicode character (using eval) : 2
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

RexFi explains me that Python can't guess eval('len(u"é")') charset.
Yep, that's difficult: locale? charset encoding? This test doesn't
matter.

@+, Haypo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: code-interact.patch
Type: text/x-patch
Size: 407 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-dev/attachments/20050913/c620d813/code-interact.bin


More information about the Python-Dev mailing list