unescape HTML entities
Rares Vernica
rvernica at gmail.com
Tue Oct 31 23:46:02 EST 2006
Hi,
How does your code deal with ' like entities?
Thanks,
Ray
Klaus Alexander Seistrup wrote:
> Rares Vernica wrote:
>
>> How can I unescape HTML entities like " "?
>>
>> I know about xml.sax.saxutils.unescape() but it only deals with
>> "&", "<", and ">".
>>
>> Also, I know about htmlentitydefs.entitydefs, but not only this
>> dictionary is the opposite of what I need, it does not have
>> " ".
>
> How about something like:
>
> #v+
> #!/usr/bin/env/python
> '''dehtml.py'''
>
> import re
> import htmlentitydef
>
> myrx = re.compile('&(' + '|'.join(htmlentitydefs.name2codepoint.keys()) + ');')
>
> def dehtml(s):
> return re.sub(
> myrx,
> lambda m: unichr(htmlentitydefs.name2codepoint[m.group(1)]),
> s
> )
> # end def dehtml
>
> if __name__ == '__main__':
> import sys
> print dehtml(sys.stdin.read()).encode('utf-8')
> # end if
>
> #v-
>
> E.g.:
>
> #v+
>
> $ echo 'frække frølår' | ./dehtml.py
> frække frølår
> $
>
> #v-
>
More information about the Python-list
mailing list