Q: The `print' statement over Unicode
François Pinard
pinard at iro.umontreal.ca
Wed May 4 11:45:59 EDT 2005
Hi, people. I hope someone would like to enlighten me.
For any application handling Unicode internally, I'm usually careful
at properly converting those Unicode strings into 8-bit strings before
writing them out.
However, this morning, I mistakenly forgot to do so before using one
Unicode string (containing a non-ASCII character) as an argument to
the `print' statement, and I did _not_ get an error. This is rather
surprising to me. I reread the section of the Python reference manual
(version 2.3.4, this machine uses 2.3.3 currently), and it does not say
anything about a special processing for Unicode strings.
In my understanding, when `print' is given an argument which is not
already a string (I read: 8-bit string), it first gets converted into
a string (I read: calling __str__). But if I call `str()' explicitly,
_then_ I get an error as expected. The question is, why is there no
error if I do not call `str()' explicity?
For example, given file `question.py' with this contents:
# -*- coding: UTF-8 -*-
texte = unicode("Fran\xe7ois", 'latin1')
print type(texte), repr(texte), texte
print type(texte), repr(texte), str(texte)
doing `python question.py' yields:
<type 'unicode'> u'Fran\xe7ois' François
<type 'unicode'> u'Fran\xe7ois'
Traceback (most recent call last):
File "question.py", line 4, in ?
print type(texte), repr(texte), str(texte)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' \
in position 4: ordinal not in range(128)
(last line wrapped for legibility).
So (trying to be crystal clear), why is the first `print' working over
its third argument, but not the second? How does `print' convert that
Unicode string to a 8-bit string for output, if not through `str()'?
What is missing to the documentation, or to my way of understanding it?
--
François Pinard http://pinard.progiciels-bpi.ca
More information about the Python-list
mailing list