How to print first(national) char from unicode string encoded in utf-8?

Marco Bizzarri marco.bizzarri at gmail.com
Mon Sep 1 15:10:41 CEST 2008


2008/9/1  <sniipe at gmail.com>:
> Hi,
>
> I have a problem with unicode string in Pylons templates(Mako). I will
> print first char from my string encoded in UTF-8 and urllib.quote(),
> for example string 'Łukasz':
>
> ${urllib.unquote(c.user.firstName).encode('latin-1')[0:1]}
>
> and I received this information:
>
> <type 'exceptions.UnicodeDecodeError'>: 'utf8' codec can't decode byte
> 0xc5 in position 0: unexpected end of data
>
> When I change from [0:1] to [0:2] everything is ok. I think it is
> because of unicode and encoding utf-8(2 bytes).
>
> How to resolve this problem?
>
> Best regards
> --
> http://mail.python.org/mailman/listinfo/python-list
>

First: you're talking about utf8 encoding, but you've written latin1
encoding. Even though I do not know Mako templates, there should be no
problem in your snippet of code, if encoding is latin1, at least for
what I can understand.

Do not assume utf8 is a two byte encoding; utf8 is a variable length
encoding. Indeed,

'a' encoded as utf8 is 'a' (one byte)

'à' encode as utf8 is '\xc3\xa0' (two bytes).


Can you explain what you're trying to accomplish (rather than how
you're tryin to accomplish it) ?



Regards
Marco



-- 
Marco Bizzarri
http://notenotturne.blogspot.com/
http://iliveinpisa.blogspot.com/


More information about the Python-list mailing list