HTML Encoded Translation

Fredrik Lundh fredrik at
Tue Oct 17 20:26:17 CEST 2006

Dave wrote:

> How can I translate this:
> gi
> to this:
> "gi"

the easiest way is to run it through an HTML or XML parser (depending on 
what the source is).  or you could use something like this:

     import re

     def fix_charrefs(text):
         def fixup(m):
             text =
                 if text[:3] == "&#x":
                     return unichr(int(text[3:-1], 16))
                     return unichr(int(text[2:-1]))
             except ValueError:
             return text # leave as is
         return re.sub("&#?\w+;", fixup, text)

     >>> fix_charrefs("gi")

also see:

> I've tried urllib.unencode and it doesn't work.

those are HTML/XML character references, not encoded URL characters.


More information about the Python-list mailing list