Unicode, stdout, and stderr
Steven D'Aprano
steve at pearwood.info
Tue Jul 22 02:58:30 EDT 2014
On Tue, 22 Jul 2014 08:18:08 +0200, Frank Millman wrote:
> Hi all
>
> This is not important, but I would appreciate it if someone could
> explain the following, run from cmd.exe on Windows Server 2003 -
>
> C:\>python
> Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32
> bit (In
> tel)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
>>>> x = '\u2119'
>>>> x # this uses stderr
> '\u2119'
What makes you think it uses stderr? To the best of my knowledge, it uses
stdout.
>>>> print(x) # this uses stdout
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "C:\Python34\lib\encodings\cp437.py", line 19, in encode
> return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> UnicodeEncodeError: 'charmap' codec can't encode character '\u2119' in
> position 0: character maps to <undefined>
I think your problem is that print tries to encode the string to your
terminal's encoding, which appears to be CP-437 ("MS DOS" code page). Can
you convince cmd.exe to use UTF-8? That should fix the problem. (Although
apparently Window's handling of UTF-8 is buggy, so it will create many
wonderful new problems, yay!)
http://stackoverflow.com/questions/388490/unicode-characters-in-windows-command-line-how
http://stackoverflow.com/questions/14109024/how-to-make-unicode-charset-in-cmd-exe-by-default
http://superuser.com/questions/269818/change-default-code-page-of-windows-console-to-utf-8
> It seems that there is a difference between writing to stdout and
> writing to stderr.
I would be surprised if that were the case, but I don't have a Windows
box to test it. Try this:
import sys
print(x, file=sys.stderr) # I expect this will fail
print(repr(x), file=sys.stdout) # I expect this will succeed
--
Steven
More information about the Python-list
mailing list