"takeuchi": a unicode string on IDLE shell

Can anyone answer this? I can reproduce the output side of this, and I believe he's right about the input side. Where should Python migrate with respect to Unicode input? I think that what Takeuchi is getting is actually better than in Pythonwin or command line (where he gets Shift-JIS)... --Guido van Rossum (home page: http://www.python.org/~guido/) ------- Forwarded Message Date: Mon, 10 Apr 2000 22:49:45 +0900 From: "takeuchi" <takeuchi.shohei@lab.ntt.co.jp> To: <guido@python.org> Subject: a unicode string on IDLE shell Dear Guido, I plaied your latest CPython(Python1.6a1) on Win98 Japanese version, and found a strange IDLE shell behavior. I'm not sure this is a bug or feacher, so I report my story anyway. When typing a Japanese string on IDLE shell with IME , Tk8.3 seems to convert it to a UTF-8 representation. Unfortunatly Python does not know this, it is dealt with an ordinary string.
s = raw_input(">>>") Type Japanese characters with IME for example あ (This is the first character of Japanese alphabet, Hiragana) s '\343\201\202' # UTF-8 encoded print s あ # A proper griph is appear on the screen
Print statement on IDLE shell works fine with a UTF-8 encoded string,however,slice operation or len() does not work. # I know this is a right result So I have to convert this string with unicode().
u = unicode(s) u u'\u3042' print u あ # A proper griph is appear on the screen
Do you think this convertion is unconfortable ? I think this behavior is inconsistant with command line Python and PythonWin. If I want the same result on command line Python shell or PythonWin shell, I have to code as follows;
s = raw_input(">>>") Type Japanese characters with IME for example あ s '\202\240' # Shift-JIS encoded print s あ # A proper griph is appear on the screen u = unicode(s,"mbcs") # if I use unicode(s) then UnicodeError is raised ! print u.encode("mbcs") # if I use print u then wrong griph is appear あ # A proper griph is appear on the screen
This difference is confusing !! I do not have the best solution for this annoyance, I hope at least IDLE shell and PythonWin shell would have the same behavior . Thank you for reading. Best Regards, takeuchi ------- End of Forwarded Message

----- Original Message ----- From: Guido van Rossum <guido@python.org> To: <python-dev@python.org> Cc: <i18n-sig@python.org> Sent: 10 April 2000 15:01 Subject: [I18n-sig] "takeuchi": a unicode string on IDLE shell
Can anyone answer this? I can reproduce the output side of this, and I believe he's right about the input side. Where should Python migrate with respect to Unicode input? I think that what Takeuchi is getting is actually better than in Pythonwin or command line (where he gets Shift-JIS)...
--Guido van Rossum (home page: http://www.python.org/~guido/) I think what he wants, as you hinted, is to be able to specify a 'system wide' default encoding of Shift-JIS rather than UTF8.
UTF-8 has a certain purity in that it equally annoys every nation, and is nobody's default encoding. What a non-ASCII user needs is a site-wide way of setting the default encoding used for standard input and output. I think this could be done with something (config file? registry key) which site.py looks at, and wraps stream encoders around stdin, stdout and stderr. To illustrate why it matters, I often used to parse data files and do queries on a Japanese name and address database; I could print my lists and tuples in interactive mode and check they worked, or initialise functions with correct data, since the OS uses Shift-JIS as its native encoding and I was manipulating Shift-JIS strings. I've lost that ability now due to the Unicode stuff and would need to do
for thing in mylist: ....print mylist.encode('shift_jis') to see the contents of a database row, rather than just mylist
BTW, Pythonwin stopped working in this regard when Scintilla came along; it prints a byte at a time now, although kanji input is fine, as is kanji pasted into a source file, as long as you specify a Japanese font. However, this is fixable - I just need to find a spare box to run Japanese windows on and find out where the printing goes wrong. Andy Robinson ReportLab
participants (2)
-
Andy Robinson
-
Guido van Rossum