HTML Encoded Translation
Fredrik Lundh
fredrik at pythonware.com
Tue Oct 17 14:26:17 EDT 2006
Dave wrote:
> How can I translate this:
>
> gi
>
> to this:
>
> "gi"
the easiest way is to run it through an HTML or XML parser (depending on
what the source is). or you could use something like this:
import re
def fix_charrefs(text):
def fixup(m):
text = m.group(0)
try:
if text[:3] == "&#x":
return unichr(int(text[3:-1], 16))
else:
return unichr(int(text[2:-1]))
except ValueError:
pass
return text # leave as is
return re.sub("&#?\w+;", fixup, text)
>>> fix_charrefs("gi")
'gi'
also see:
http://effbot.org/zone/re-sub.htm#strip-html
> I've tried urllib.unencode and it doesn't work.
those are HTML/XML character references, not encoded URL characters.
</F>
More information about the Python-list
mailing list