print vs sys.stdout.write, and UnicodeError

Thu Oct 25 15:32:51 EDT 2007

Martin Marcher <martin at marcher.name> wrote:
> 25 Oct 2007 17:37:01 GMT, Brent Lievers <3wbl at qlink.queensu.ca>:
>> Greetings,
>>
>> I have observed the following (python 2.5.1):
>>
>> >>> import sys
>> >>> print sys.stdout.encoding
>> UTF-8
>> >>> print(u'\u00e9')
>> ?
>> >>> sys.stdout.write(u'\u00e9\n')
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
>> position 0: ordinal not in range(128)
> 
>>>> sys.stdout.write(u'\u00e9\n'.encode("UTF-8"))
> ?
> 
>> Is this correct?  My understanding is that print ultimately calls
>> sys.stdout.write anyway, so I'm confused as to why the Unicode error
>> occurs in the second case.  Can someone explain?
> 
> you forgot to encode what you are going to "print" :)

Thanks.  I obviously have a lot to learn about both Python and Unicode ;-)

So does print do this encoding step based on the value of 
sys.stdout.encoding?  In other words, something like:

  sys.stdout.write(textstr.encode(sys.stdout.encoding))

I'm just trying to understand why encode() is needed in the one case but 
not the other.

Cheers,

Brent