unescape HTML entities
Fredrik Lundh
fredrik at pythonware.com
Sat Oct 28 22:04:00 EDT 2006
Rares Vernica wrote:
> How can I unescape HTML entities like " "?
run it through an HTML parser.
or use something like this:
http://effbot.org/zone/re-sub.htm#strip-html
(if you want to keep elements, change the regular expression in the
re.sub call to "(?s)&#?\w+;")
> I know about xml.sax.saxutils.unescape() but it only deals with "&",
> "<", and ">".
>
> Also, I know about htmlentitydefs.entitydefs, but not only this
> dictionary is the opposite of what I need, it does not have " ".
>>> htmlentitydefs.entitydefs.get("nbsp")
'\xa0'
</F>
More information about the Python-list
mailing list