error when printing a UTF-8 string (python 2.6.2)

Peter Otten __peter__ at web.de
Wed Apr 21 04:38:21 EDT 2010


fab at slick.airforce-one.org wrote:

> Hello.
> 
> I read a string from an utf-8 file:
> 
> fichierLaTeX = codecs.open(sys.argv[1], "r", "utf-8")
> s = fichierLaTeX.read()
> fichierLaTeX.close()
> 
> I can then print the string without error with 'print s'.
> 
> Next I parse this string:
> 
> def parser(s):
>   i = 0
>   while i < len(s):
>     if s[i:i+1] == '\\':
>        i += 1
>        if s[i:i+1] == '\\':
>          print "backslash"
>        elif s[i:i+1] == '%':
>     print "pourcentage"
>        else:
>           if estUnCaractere(s[i:i+1]):
> motcle = ""
> while estUnCaractere(s[i:i+1]):
> motcle += s[i:i+1]
> i += 1
>        print "mot-clé '"+motcle+"'"
> 
> but when I run this code, I get this error:
> 
> Traceback (most recent call last):
>   File "./versOO.py", line 115, in <module>
>       parser(s)
>         File "./versOO.py", line 105, in parser
> print "mot-clé '"+motcle+"'"
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in
> position 6: ordinal not in range(128)
> 
> What must I do to solve this?

>>> "mot-clé" + "mot-clé"
'mot-cl\xc3\xa9mot-cl\xc3\xa9'

>>> u"mot-clé" + u"mot-clé"
u'mot-cl\xe9mot-cl\xe9'

>>> "mot-clé" + u"mot-clé"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 6: 
ordinal not in range(128)

codecs.open().read() returns unicode, but your literals are all bytestrings.
When you are mixing unicode and str Python tries to convert the bytestring 
to unicode using the ascii codec, and of course fails for non-ascii 
characters.

Change your string literals to unicode by adding the u-prefix and you should 
be OK.

Peter



More information about the Python-list mailing list