IDLE raw_input() and unicode
jiwon at softwise.co.kr
Thu Jun 5 08:59:15 CEST 2003
unendliche at hanmail.net (Seo Sanghyeon) wrote in message news:<45e6545c.0306041900.7efc3406 at posting.google.com>...
> My setting is IDLE 0.8 on Python 2.2.2, WinXP. In case it's relevant,
> here's Korea. (Hangeul is Korean writing.)
> IDLE can print Hangeul just fine. (Note: my sitecustomize.py does
> sys.setdefaultencoding("utf-8"). In case it's relevant.)
> But IDLE fail to get Hangeul input with raw_input(). It baffles
> and prints out:
> Traceback (most recent call last):
> File "<pyshell#0>", line 1, in ?
> TypeError: object.readline() returned non-string
> After some study, I found that it seems bulitin raw_input() does
> sys.stdin.readline() *AND* type-checking. (Whoa... since C code
> is involved, there's no traceback and I'm not sure. Should I read
> C code on Python CVS? *normal user shudder*)
> sys.stdin is replaced with an instance of PyShell.PyShell by IDLE.
> And readline() method of PyShell class does something I don't understand
> and does:
> # PyShell.py around line 475
> line = self.text.get("iomark", "end-1c")
> return line
> PyShell inherits from OutputWindow and it in turn inherits from
> EditorWindow and EditorWindow initializes self.text as Tkinter.Text
> widget. And I don't know what the hell "self.tk.call(self._w, 'get',
> index1, index2)", that is, an implementation of Tkinter.Text.get(),
> does at all, but I assume it returns sort of "non-string".
> So I opened DOS command line window and started python. And typed:
> import PyShell
> shell = PyShell.PyShell()
> shell.reading = 1
> # Typing some Hangeul. In this case, the name of Python itself.
> shell.text.get('iomark', 'end-1c')
> It prints out: u'\ud30c\uc774\uc36c\n'. So some sort of "non-string"
> is actually unicode.
> So... I suggest the following:
> 1) Make raw_input() able to return unicode. Why not? (But I suspect
> there may be some deep reason.)
> 2) Or, at least, make PyShell.readline() returns other than "non-string".
> I think just changing "return line" to "return str(line)" would do.
> (Make a change and try again.) Yes, it does.
> I googled with "IDLE raw_input unicode", and to my surprise, just few
> posts I found. Some German one posted raw_input() doesn't handle umlauts.
> So it seems this is quite general i18n problem.
> Should I submit... *normal user shudder* a patch? It's just one-line
> change... Or can someone do all unicode-IDLE-users a favor and submit
> a patch?
> I'm more than happy to hear something like "Yes, we know that, and it's
> all fixed on IDLE version >0.8, and with Python 2.3 you will have no
> -- Seo Sanghyeon
This is not a bug in raw_input, but in IDLE, PyShell.py.
Returning str(line) instead of just line in readline() method at
PyShell class (from PyShell.py, line 794) will fix it. I'm not sure if
this is right way to fix it, but it's how someone with same problem
coped with. You can read related thread in www.python.or.kr,
specifically here, http://bbs.python.or.kr/viewtopic.php?t=14625&highlight=raw_input
More information about the Python-list