[Tutor] myown.getfilesystemencoding()
Albert-Jan Roskam
fomcl at yahoo.com
Wed Sep 4 14:39:10 CEST 2013
----- Original Message -----
> From: eryksun <eryksun at gmail.com>
> To: Oscar Benjamin <oscar.j.benjamin at gmail.com>; Albert-Jan Roskam <fomcl at yahoo.com>
> Cc: Python Mailing List <tutor at python.org>
> Sent: Sunday, September 1, 2013 7:30 AM
> Subject: Re: [Tutor] myown.getfilesystemencoding()
>
> On Sat, Aug 31, 2013 at 9:16 AM, Oscar Benjamin
> <oscar.j.benjamin at gmail.com> wrote:
>> Spyder has both an internal interpreter and an external interpreter.
>> One is the same interpreter process that runs the Spyder GUI. The
>> other is run in a subprocess which keeps the GUI safe but reduces your
>> ability to inspect the workspace data via the GUI. So presumable
>> Albert means the "external" interpreter here.
>
> I installed Spyder on Windows to look into this. It's using Qt
> QProcess to run the external interpreter in a child process.
> sys.stdin.isatty() confirms it's not a tty, and Process Explorer
> confirms that all 3 standard I/O handles (from msvcrt.get_osfhandle())
> are pipes.
>
> The file encoding is None for piped standard I/O, so printing unicode
> falls back to the default encoding. Normally this is ASCII in 2.x, but
> Spyder uses sitecustomize to set the default encoding based on the
> default locale. It also sets the hidden console's codepage:
>
> if os.name == 'nt': # Windows platforms
>
> # Setting console encoding (otherwise Python does not
> # recognize encoding)
> try:
> import locale, ctypes
> _t, _cp = locale.getdefaultlocale('LANG')
> try:
> _cp = int(_cp[2:])
> ctypes.windll.kernel32.SetConsoleCP(_cp)
> ctypes.windll.kernel32.SetConsoleOutputCP(_cp)
> except (ValueError, TypeError):
> # Code page number in locale is not valid
> pass
> except ImportError:
> pass
>
> http://code.google.com/p/spyderlib/source/browse/spyderlib/
> widgets/externalshell/sitecustomize.py?name=v2.2.0#74
>
> Probably this was added for a good reason, but I don't grok the point.
> Python isn't interested in the hidden console window at this stage,
> and the standard handles are all pipes. I didn't notice any difference
> with these lines commented out, running with Python 2.7.5. YMMV
>
> There's a design flaw here since sys.stdin.encoding is used by the
> parser in single-input mode. With it set to None, Unicode literals
> entered in the REPL will be incorrectly parsed if they use non-ASCII
> byte values. For example, given the input is Windows 1252, then u'€'
> will be parsed as u'\x80' (i.e. PAD, a C1 Control code).
>
> Here's an alternative to messing with the default encoding -- at least
> for the new version of Spyder that doesn't have to support 2.5. Python
> 2.6+ checks for the PYTHONIOENCODING environment variable. This
> overrides the encoding/errors values in Py_InitializeEx():
>
> http://hg.python.org/cpython/file/70274d53c1dd/Python/pythonrun.c#l265
>
> You can test setting PYTHONIOENCODING without restarting Spyder. Just
> bring up Spyder's "Internal Console" and set
> os.environ['PYTHONIOENCODING']. The change applies to new interpreters
> started from the "Interpreters" menu. Spyder could set this itself in
> the environment that gets passed to the QProcess object.
Wow, thanks for looking all this up. Thanks also to other people who replied. It's not really desirable that a IDE adds confusion to an area that's already confusing to begin with. But given that chcp returns cp850 on my windows system (commandline), wouldn't it be more descriptive if sys.getfilesystemencoding() returned 'cp850'?
In other words: In the code below, isn't line [1] an obfuscated version of line [2]? Both versions return only question marks on my system.
# Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] on win32
import ctypes
ords = [3629, 3633, 3585, 3625, 3619, 3652, 3607, 3618]
u = "".join([unichr(i) for i in ords])
print u.encode("mbcs") # [1]
#cp850 is what chcp returns on my Windows system
print u.encode("cp850", "replace") # [2]
thai_latin_cp = "cp874"
cp_ = int(thai_latin_cp[2:])
ctypes.windll.kernel32.SetConsoleCP(cp_)
ctypes.windll.kernel32.SetConsoleOutputCP(cp_)
print u.encode("cp874", "replace")
ctypes.windll.kernel32.SetConsoleCP() and SetConsoleOutputCP seem useful. Can these functions be used to correctly display the Thai characters on my western European Windows version? (last block of code is an attempt) Or is that not possible altogether?
Best wishes,
Albert-Jan
More information about the Tutor
mailing list