How to print first(national) char from unicode string encoded inutf-8?

Mark Tolonen M8R-yfto6h at
Tue Sep 2 06:05:28 CEST 2008

"Marco Bizzarri" <marco.bizzarri at> wrote in message 
news:mailman.331.1220276398.3487.python-list at
> On Mon, Sep 1, 2008 at 3:25 PM,  <sniipe at> wrote:
>> When I do ${urllib.unquote(c.user.firstName)} without encoding to
>> latin-1 I got different chars than I will get: no Łukasz but Å ukasz
>> --
> That's crazy. "string".encode('latin1') gives you a latin1 encoded
> string; latin1 is a single byte encoding, therefore taking the first
> byte should be no problem.
> Have you tried:
> urlib.unquote(c.user.firstName)[0].encode('latin1') or
> urlib.unquote(c.user.firstName)[0].encode('utf8')
> I'm assuming here that the urlib.unquote(c.user.firstName) returns an
> encodable string (which I'm absolutely not sure), but if it does, this
> should take the first 'character'.

The OP stated that the original string was "encoded in UTF-8 and 
urllib.quote()", so after urllib.unquote the string is in UTF-8 format. 
This must be decoded into a Unicode string before removing the first 


The next problem is that the character in the OP's example string 'Ł' is not 
present in the latin-1 encoding, but using utf-8 encoding demonstrates that 
the full two-byte UTF-8 encoded character is collected:

    >>> import urllib
    >>> name = urllib.quote(u'Łukasz'.encode('utf-8'))
    >>> name
    >>> urllib.unquote(name).decode('utf-8')[0].encode('utf-8')


More information about the Python-list mailing list