q: how to output a unicode string?
Diez B. Roggisch
deets at nospam.web.de
Tue Apr 24 12:43:16 EDT 2007
Frank Stajano wrote:
> A simple unicode question. How do I print?
>
> Sample code:
>
> # -*- coding: utf-8 -*-
> s1 = u"héllô wórld"
> print s1
> # Gives UnicodeEncodeError: 'ascii' codec can't encode character
> # u'\xe9' in position 1: ordinal not in range(128)
>
>
> What I actually want to do is slightly more elaborate: read from a text
> file which is in utf-8, do some manipulations of the text and print the
> result on stdout. I understand I must open the file with
>
> f = codecs.open("input.txt", "r", "utf-8")
>
> but then I get stuck as above.
>
> I tried
>
> s2 = s1.encode("utf-8")
> print s2
>
> but got
>
> héllô wórld
Which is perfectly alright - it's just that your terminal isn't prepared to
decode UTF-8, but some other encoding, like latin1.
> Then, in the hope of being able to write the string to a file if not to
> stdout, I also tried
>
>
> import codecs
> f = codecs.open("out.txt", "w", "utf-8")
> f.write(s2)
>
> but got
>
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1:
> ordinal not in range(128)
Instead of writing s2 (which is a byte-string!!!), write s1. It will work.
The error you get stems from f.write wanting a unicode-object, but s2 is a
bytestring (you explicitly converted it before), so python tries to encode
the bytestring with the default encoding - ascii - to a unicode string.
This of course fails.
Diez
More information about the Python-list
mailing list