bruno.42.desthuilliers at websiteburo.invalid
Fri Dec 19 13:22:33 CET 2008
digisatori at gmail.com a écrit :
> The below snippet code generates UnicodeDecodeError.
> #!/usr/bin/env python
> #--*-- coding: utf-8 --*--
> s = 'äöü'
> u = unicode(s)
> It seems that the system use the default encoding- ASCII to decode the
> utf8 encoded string literal, and thus generates the error.
Indeed. You want:
u = unicode(s, 'utf-8') # or : u = s.decode('utf-8')
> The question is why the Python interpreter use the default encoding
> instead of "utf-8", which I explicitly declared in the source.
Because there's no reliable way for the interpreter to guess how what's
passed to unicode has been encoded ?
s = s.decode("utf-8").encode("latin1")
# should unicode try to use utf-8 here ?
u = unicode(s)
print "would have worked better with "u = unicode(s, 'latin1')"
NB : IIRC, the ascii subset is safe whatever the encoding, so I'd say
it's a sensible default...
More information about the Python-list