[Tutor] (no subject)

eryk sun eryksun at gmail.com
Mon Jul 25 07:19:46 EDT 2016


On Fri, Jul 22, 2016 at 7:38 AM, DiliupG <diliupg at gmail.com> wrote:
> I am using Python 2.7.12 on Windows 10
>
> filename = u"මේක තියෙන්නේ සිංහලෙන්.txt"
> Unsupported characters in input

That error message is from IDLE. I'm not an expert with IDLE, so I
don't know what the following hack potentially breaks, but it does
allow entering the above Unicode filename in the interactive
interpreter.

Edit "Python27\Lib\idlelib\IOBinding.py". Look for the following
section on line 34:

    if sys.platform == 'win32':
        # On Windows, we could use "mbcs". However, to give the user
        # a portable encoding name, we need to find the code page
        try:
            encoding = locale.getdefaultlocale()[1]
            codecs.lookup(encoding)
        except LookupError:
            pass

Replace the encoding value with "utf-8" as follows:

            # encoding = locale.getdefaultlocale()[1]
            encoding = "utf-8"

When you restart IDLE, you should see that sys.stdin.encoding is now "utf-8".

IOBinding.encoding is used by ModifiedInterpreter.runsource in
PyShell.py. When the encoding is UTF-8, it passes the Unicode source
string directly to InteractiveInterpreter.runsource, where it gets
compiled using the built-in compile() function.

Note that IDLE uses the Tk GUI toolkit, which -- at least with Python
Tkinter on Windows -- is limited to the first 65,536 Unicode
characters, i.e the Basic Multilingual Plane. The BMP includes
Sinhalese, so your filename string is fine.


More information about the Tutor mailing list