unescape HTML entities

Rares Vernica rvernica at gmail.com
Wed Nov 1 05:46:02 CET 2006


Hi,

How does your code deal with ' like entities?

Thanks,
Ray

Klaus Alexander Seistrup wrote:
> Rares Vernica wrote:
> 
>> How can I unescape HTML entities like " "?
>>
>> I know about xml.sax.saxutils.unescape() but it only deals with
>> "&", "<", and ">".
>>
>> Also, I know about htmlentitydefs.entitydefs, but not only this 
>> dictionary is the opposite of what I need, it does not have 
>> " ".
> 
> How about something like:
> 
> #v+
> #!/usr/bin/env/python
> '''dehtml.py'''
> 
> import re
> import htmlentitydef
> 
> myrx = re.compile('&(' + '|'.join(htmlentitydefs.name2codepoint.keys()) + ');')
> 
> def dehtml(s):
>     return re.sub(
>         myrx,
>         lambda m: unichr(htmlentitydefs.name2codepoint[m.group(1)]),
>         s
>     )
> # end def dehtml
> 
> if __name__ == '__main__':
>     import sys
>     print dehtml(sys.stdin.read()).encode('utf-8')
> # end if
> 
> #v-
> 
> E.g.:
> 
> #v+
> 
> $ echo 'frække frølår' | ./dehtml.py
> frække frølår
> $ 
> 
> #v-
> 




More information about the Python-list mailing list