Unicode, stdout, and stderr
wxjmfauth at gmail.com
wxjmfauth at gmail.com
Tue Jul 22 05:33:58 EDT 2014
Le mardi 22 juillet 2014 11:09:37 UTC+2, Peter Otten a écrit :
> Frank Millman wrote:
>
>
>
> >
>
> > "Peter Otten" <__peter__ at web.de> wrote in message
>
> > news:lql3am$2q7$1 at ger.gmane.org...
>
> >> Frank Millman wrote:
>
> >>
>
> >>> Hi all
>
> >>>
>
> >>> This is not important, but I would appreciate it if someone could
>
> >>> explain the following, run from cmd.exe on Windows Server 2003 -
>
> >>>
>
> >>> C:\>python
>
> >>> Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32
>
> >>> bit (In
>
> >>> tel)] on win32
>
> >>> Type "help", "copyright", "credits" or "license" for more information.
>
> >>>>>> x = '\u2119'
>
> >>>>>> x # this uses stderr
>
> >>> '\u2119'
>
> >>
>
> >> No, both print to stdout, but just
>
> >>
>
> >>>>> x
>
> >>
>
> >> is passed to the display hook of the interactive interpreter. This
>
> >> applies
>
> >> repr() and then tries to print the result. If this fails it makes
>
> >> another effort, roughly (the actual code is written in C)
>
> >>
>
> >> sys.stdout.buffer.write(repr(x).encode(
>
> >> sys.stdout.encoding, "backslashreplace"))
>
> >>
>
> >>
>
> >
>
> > Thanks, Peter. Very interesting.
>
> >
>
> > Out of interest, does the same thing happen when writing to sys.stderr?
>
>
>
> If you are asking about the fallback mechanism, that is specific to
>
> sys.displayhook in the interactive interpreter.
>
>
>
> But stdout and stderr do handle errors differently:
>
>
>
> >>> import sys
>
> >>> sys.stdout.errors
>
> 'strict'
>
> >>> sys.stderr.errors
>
> 'backslashreplace'
>
>
>
> So a codepoint written to stdout that cannot be encoded with stdout.encoding
>
> raises an error while a codepoint written to stderr that cannot be encoded
>
> with stderr.encoding is escaped.
>
>
>
> Another way to make stdout more forgiving:
>
>
>
> >>> import sys
>
> >>> print("\u2119")
>
> Traceback (most recent call last):
>
> File "<stdin>", line 1, in <module>
>
> File "/usr/local/lib/python3.4/encodings/cp437.py", line 19, in encode
>
> return codecs.charmap_encode(input,self.errors,encoding_map)[0]
>
> UnicodeEncodeError: 'charmap' codec can't encode character '\u2119' in
>
> position 0: character maps to <undefined>
>
> >>> sys.stdout = open(1, mode="w", errors="xmlcharrefreplace",
>
> encoding=sys.stdout.encoding, closefd=False)
>
> >>> print("\u2119")
>
> ℙ
=====
or in a similar way
>>> print(ascii('abcéoe EURO\u2119'))
'abc\xe9\u0153\u20ac\u2119'
>>> sys.stdout.write(ascii('abcéoe EURO\u2119') + '\n')
'abc\xe9\u0153\u20ac\u2119'
>>> sys.stderr.write(ascii('abcéoe EURO\u2119') + '\n')
'abc\xe9\u0153\u20ac\u2119'
>>>
>>> sys.stdout.write((ascii('abcéoe EURO\u2119').strip("'") + '\n'))
abc\xe9\u0153\u20ac\u2119
>>>
jmf
More information about the Python-list
mailing list