Unicode and UrlEncode!

Jeff Epler jepler at unpythonic.net
Wed Mar 17 18:39:31 EST 2004


You should transform the byte string 'french' from whatever encoding
it's in (latin-1 according to your coding: directive) to unicode, if
you are going to tell google it's in Unicode.

Example:
    "s\xc8dimentation".decode("latin-1").encode("utf-8")

Or, you can tell Python that the string is a Unicode literal, and it
will do the .decode() step for you:
    u"sÈdimentation".encode("utf-8")

"" is always a bytestring literal, and u"" is always a unicode string
literal.  If you have "<sequence of bytes>" then the string's value at
runtime is "<sequence of bytes>", and if you have u"<sequence of bytes>"
then the string's value is "<sequence of bytes>".encode(<file encoding>)

Jeff




More information about the Python-list mailing list