Q: The `print' statement over Unicode
François Pinard
pinard at iro.umontreal.ca
Sat May 7 08:01:09 EDT 2005
[Thomas Heller]
> François Pinard <pinard at iro.umontreal.ca> writes:
> > [...] given file `question.py' with this contents:
> > # -*- coding: UTF-8 -*-
> > texte = unicode("Fran\xe7ois", 'latin1')
> > print type(texte), repr(texte), texte
> > print type(texte), repr(texte), str(texte)
> > doing `python question.py' yields:
> > <type 'unicode'> u'Fran\xe7ois' François
> > <type 'unicode'> u'Fran\xe7ois'
> > Traceback (most recent call last):
> > File "question.py", line 4, in ?
> > print type(texte), repr(texte), str(texte)
> > UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' \
> > in position 4: ordinal not in range(128)
> > [...] why is the first `print' working over its third argument, but
> > not the second? How does `print' convert that Unicode string to a
> > 8-bit string for output, if not through `str()'? What is missing to
> > the documentation, or to my way of understanding it?
> AFAIK, print uses sys.stdout.encoding to encode the unicode string.
Much thanks for this information.
I was not aware of this file attribute. Looking around, I found a
quick description in the Library Reference, under "2.3.8 File Objects".
However, I did not find in the documentation the rules stating how
or when this attribute receives a value, and in particular here, for
the case of `sys.stdout'. The Reference Manual, under "6.6 The print
statement", is silent about how Unicode strings are handled.
Am I looking in the wrong places, or else, should not the standard
documentation more handily explain such things?
--
François Pinard http://pinard.progiciels-bpi.ca
More information about the Python-list
mailing list