[Python-Dev] "takeuchi": a unicode string on IDLE shell

Guido van Rossum guido@python.org
Mon, 10 Apr 2000 10:01:58 -0400


Can anyone answer this?  I can reproduce the output side of this, and
I believe he's right about the input side.  Where should Python
migrate with respect to Unicode input?  I think that what Takeuchi is
getting is actually better than in Pythonwin or command line (where he
gets Shift-JIS)...

--Guido van Rossum (home page: http://www.python.org/~guido/)

------- Forwarded Message

Date:    Mon, 10 Apr 2000 22:49:45 +0900
From:    "takeuchi" <takeuchi.shohei@lab.ntt.co.jp>
To:      <guido@python.org>
Subject: a unicode string on IDLE shell 

Dear Guido,

I plaied your latest CPython(Python1.6a1) on Win98 Japanese version,
and found a strange IDLE shell behavior.

I'm not sure this is a bug or feacher, so I report my story anyway.

When typing  a Japanese string on IDLE shell with IME ,
Tk8.3 seems to convert it to a UTF-8 representation.
Unfortunatly Python does not know this,
it is dealt with an ordinary string.

>>> s = raw_input(">>>")
Type Japanese characters with IME
for example  $B$"(B
(This is the first  character of Japanese alphabet, Hiragana)
>>> s
 '\343\201\202'   # UTF-8 encoded
>>> print s
$B$"(B                     # A proper griph is appear on the screen

Print statement on IDLE shell works fine with a UTF-8 encoded
string,however,slice operation or len() does not work.
 # I know this is a right result

So I have to convert this string with unicode().

>>> u = unicode(s)
>>> u
u'\u3042'
>>> print u
$B$"(B                     # A proper griph is appear on the screen

Do you think this convertion is unconfortable ?

I think this behavior is inconsistant with command line Python
and PythonWin.

If I want  the same result on command line Python shell or PythonWin shell,
I have to code as follows;
>>> s = raw_input(">>>")
Type Japanese characters with IME
for example  $B$"(B
>>>s
'\202\240'  # Shift-JIS encoded
>>> print s
$B$"(B                     # A proper griph is appear on the screen
>>> u = unicode(s,"mbcs")  # if I use unicode(s) then UnicodeError is raised
!
>>>print u.encode("mbcs")  # if I use print u then wrong griph is appear
$B$"(B                     # A proper griph is appear on the screen

This  difference is confusing  !!
I do not have the best solution for this annoyance, I hope at least IDLE
shell and PythonWin
shell would have  the same behavior .

Thank you for reading.

Best Regards,

       takeuchi

------- End of Forwarded Message