[ python-Bugs-1288615 ] Python code.interact() and UTF-8 locale

SourceForge.net noreply at sourceforge.net
Mon Sep 12 14:46:32 CEST 2005


Bugs item #1288615, was opened at 2005-09-12 13:40
Message generated for change (Comment added) made by haypo
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1288615&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Parser/Compiler
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: STINNER Victor (haypo)
Assigned to: Nobody/Anonymous (nobody)
Summary: Python code.interact() and UTF-8 locale

Initial Comment:
Hi,

I found a bug in Python interactive command line
(program python alone: looks to be code.interact()
function in code.py). With UTF-8 locale, the command <<
u"é" >> returns << u'\xc3\xa9' >> and not << u'\xE9'
>>. Remember: the french e with acute is Unicode 233
(0xE9), encoded \xC3 \xA9 in UTF-8.

Another example of the bug:
  #-*- coding: UTF-8 -*-
  code = "u\%s\" % "\xc3\xa9"
  compiled = compile(code,'<string>',"single")
  exec compiled
Result :
  u'\xc3\xa9'
Excepted result :
  u'\xe9'

After long hours of debuging (read Python
documentation, debug Python with gdb, read Python C
source code, ...) I found the origin of the bug:
function parsestr() in Python/compile.c. This function
translate a string to a unicode string (or a classic
string). The problem is when the encoding declaration
doesn't exist: the string isn't converted.

Solution to the first code:
  #-*- coding: ascii -*-
  code = """#-*- coding: UTF-8 -*-
  u\%s\""" % "\xc3\xa9"
  compiled = compile(code,'<string>',"single")
  exec compiled

Proposition: u"..." and unicode("...") should use
sys.stdin.encoding by default. They will work as
unicode("...", sys.stdin.encoding). Or easier, the
compiler should use sys.stdin.encoding and not ascii as
default encoding.

Sorry if someone already reported this bug. And, is it
a bug or a feature ? ;-)

Bye, Haypo

----------------------------------------------------------------------

>Comment By: STINNER Victor (haypo)
Date: 2005-09-12 14:46

Message:
Logged In: YES 
user_id=365388

Ok ok, after long discution with RexFi on IRC, I understood
that Python can't *guess* string encoding ... I agree with
that, system locale or source encoding are not a good choice.

But ... Python console have a bug. It uses raw_input(). So I
wrote a patch to just add the right unicode cast. But Python
console don't looks to be code.interact().

I attach the patch to this comment.

Haypo

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1288615&group_id=5470


More information about the Python-bugs-list mailing list