UTF-8 to unicode or latin-1 (and yes, I read the FAQ)

Thu Oct 19 06:12:44 EDT 2006

Duncan Booth wrote:
> NoelByron at gmx.net wrote:
>
> > 'K\xc3\xb6ni'.decode('utf-8')     # 'K\xc3\xb6ni' should be 'König',
> > contains a german 'umlaut'
> >
> > but failed since python assumes every string to decode to be ASCII?
>
> No, Python would assume the string to be utf-8 encoded in this case:
>
> >>> 'K\xc3\xb6ni'.decode('utf-8').encode('latin1')
> 'K\xf6ni'
>
> Your code must have failed somewhere else. Try posting actual failing code
> and actual traceback.

You are right. My test code was:

print 'K\xc3\xb6ni'.decode('utf-8')

and this line raised a UnicodeDecode exception. I didn't realize that
the exception was actually raised by print and thought it was the
decode. That explains the fact that a 'ignore' in the decode showed no
effect at all, too.

Thank you for helping!

Best regards,
Noel