urllib.unquote + unicode
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Wed Nov 14 01:20:43 EST 2007
En Tue, 13 Nov 2007 13:14:18 -0300, koara <koara at atlas.cz> escribió:
> i am using urllib.unquote_plus to unquote a string. Sometimes i get a
> strange string like for example "spolu%u017E%E1ci.cz" to unquote. Here
> the problem is that some application decided to quote a non-ascii
> character as %uxxxx directly, instead of using an encoding and quoting
> byte per byte.
>
> Python (2.4.1) simply returns "'spolu%u017E\xe1ci.cz", which is likely
> not what the application meant.
>
> My question is, is this %u quoting a standard (i.e., urllib is in the
> wrong),
Not that I know of (and that doesn't prove anything).
> is it not (i.e., the application is in the wrong and urllib
> silently ignores the '%u0' - why?), and most importantly, is there a
> simple workaround to get it working as expected?
Try this (untested):
def unquote_plus_u(source):
result = unquote_plus(source)
if '%u' in result:
result = result.replace('%u','\\u').decode('unicode_escape')
return result
--
Gabriel Genellina
More information about the Python-list
mailing list