URL Character Decoding

Sun Jan 29 22:38:34 EST 2006

Kirk McDonald wrote:
> If you have a link such as, e.g.:
> 
> <a href="index.py?title=Main Menu">Main menu!</a>
> 
> The space will be translated to the character code '%20' when you later 
> retrieve the GET data. Not knowing if there was a library function that 
> would convert these back to their actual characters, I've written the 
> following:
> 
> import re
> 
> def sub_func(m):
>     return chr(int(m.group()[1:], 16))
> 
> def parse_title(title):
>     p = re.compile(r'%[0-9][0-9]')
>     return re.sub(p, sub_func, title)
> 
> (I know I could probably use a lambda function instead of sub_func, but 
> I come to Python via C++ and am still not entirely used to them. This is 
> clearer to me, at least.)
> 
> I guess what I'm asking is: Is there a library function (in Python or 
> mod_python) that knows how to do this? Or, failing that, is there a 
> different regex I could use to get rid of the substitution function?
> 
> -Kirk McDonald

Actually, I just noticed this doesn't really work at all. The URL 
character codes are in hex, so not only does the regex not match what it 
should, but sub_func fails miserably. See why I wanted a library function?

-Kirk McDonald