[Python-Dev] "takeuchi": a unicode string on IDLE shell
Guido van Rossum
guido@python.org
Mon, 10 Apr 2000 10:01:58 -0400
Can anyone answer this? I can reproduce the output side of this, and
I believe he's right about the input side. Where should Python
migrate with respect to Unicode input? I think that what Takeuchi is
getting is actually better than in Pythonwin or command line (where he
gets Shift-JIS)...
--Guido van Rossum (home page: http://www.python.org/~guido/)
------- Forwarded Message
Date: Mon, 10 Apr 2000 22:49:45 +0900
From: "takeuchi" <takeuchi.shohei@lab.ntt.co.jp>
To: <guido@python.org>
Subject: a unicode string on IDLE shell
Dear Guido,
I plaied your latest CPython(Python1.6a1) on Win98 Japanese version,
and found a strange IDLE shell behavior.
I'm not sure this is a bug or feacher, so I report my story anyway.
When typing a Japanese string on IDLE shell with IME ,
Tk8.3 seems to convert it to a UTF-8 representation.
Unfortunatly Python does not know this,
it is dealt with an ordinary string.
>>> s = raw_input(">>>")
Type Japanese characters with IME
for example $B$"(B
(This is the first character of Japanese alphabet, Hiragana)
>>> s
'\343\201\202' # UTF-8 encoded
>>> print s
$B$"(B # A proper griph is appear on the screen
Print statement on IDLE shell works fine with a UTF-8 encoded
string,however,slice operation or len() does not work.
# I know this is a right result
So I have to convert this string with unicode().
>>> u = unicode(s)
>>> u
u'\u3042'
>>> print u
$B$"(B # A proper griph is appear on the screen
Do you think this convertion is unconfortable ?
I think this behavior is inconsistant with command line Python
and PythonWin.
If I want the same result on command line Python shell or PythonWin shell,
I have to code as follows;
>>> s = raw_input(">>>")
Type Japanese characters with IME
for example $B$"(B
>>>s
'\202\240' # Shift-JIS encoded
>>> print s
$B$"(B # A proper griph is appear on the screen
>>> u = unicode(s,"mbcs") # if I use unicode(s) then UnicodeError is raised
!
>>>print u.encode("mbcs") # if I use print u then wrong griph is appear
$B$"(B # A proper griph is appear on the screen
This difference is confusing !!
I do not have the best solution for this annoyance, I hope at least IDLE
shell and PythonWin
shell would have the same behavior .
Thank you for reading.
Best Regards,
takeuchi
------- End of Forwarded Message