unescape HTML entities

Fredrik Lundh fredrik at pythonware.com
Sat Oct 28 22:04:00 EDT 2006


Rares Vernica wrote:

> How can I unescape HTML entities like " "?

run it through an HTML parser.

or use something like this:

     http://effbot.org/zone/re-sub.htm#strip-html

(if you want to keep elements, change the regular expression in the 
re.sub call to "(?s)&#?\w+;")

> I know about xml.sax.saxutils.unescape() but it only deals with "&", 
> "<", and ">".
> 
> Also, I know about htmlentitydefs.entitydefs, but not only this 
> dictionary is the opposite of what I need, it does not have " ".

 >>> htmlentitydefs.entitydefs.get("nbsp")
'\xa0'

</F>




More information about the Python-list mailing list