[Patches] [ python-Patches-912410 ] HTMLParser should support
entities in attributes
SourceForge.net
noreply at sourceforge.net
Mon Mar 8 20:21:04 EST 2004
Patches item #912410, was opened at 2004-03-08 19:20
Message generated for change (Comment added) made by aaronsw
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=912410&group_id=5470
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Aaron Swartz (aaronsw)
Assigned to: Nobody/Anonymous (nobody)
Summary: HTMLParser should support entities in attributes
Initial Comment:
HTMLParser doesn't currently support entities in attributes,
like this:
<span title="&8221; is a nice character">foo</span>
This patch fixes that. Simply replace the unescape in
HTMLParser.py with:
import htmlentitydefs
def unescape(self, s):
def replaceEntities(s):
s = s.groups()[0]
if s[0] == "#":
s = s[1:]
if s[0] in ['x','X']:
c = int(s[1:], 16)
else:
c = int(s)
return unichr(c)
else:
return
unichr(htmlentitydefs.name2codepoint[c])
return re.sub(r"&(#?[xX]?(?:[0-9a-fA-F]+|\w{1,8}));",
replaceEntities, s)
----------------------------------------------------------------------
>Comment By: Aaron Swartz (aaronsw)
Date: 2004-03-08 19:21
Message:
Logged In: YES
user_id=122141
Oops. The replacement function is attached.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=912410&group_id=5470
More information about the Patches
mailing list