[Patches] [ python-Patches-912410 ] HTMLParser should support entities in attributes

SourceForge.net noreply at sourceforge.net
Mon Mar 8 20:21:04 EST 2004


Patches item #912410, was opened at 2004-03-08 19:20
Message generated for change (Comment added) made by aaronsw
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=912410&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Aaron Swartz (aaronsw)
Assigned to: Nobody/Anonymous (nobody)
Summary: HTMLParser should support entities in attributes

Initial Comment:
HTMLParser doesn't currently support entities in attributes, 
like this:

<span title="&8221; is a nice character">foo</span>

This patch fixes that. Simply replace the unescape in 
HTMLParser.py with:


import htmlentitydefs

def unescape(self, s):

	def replaceEntities(s):
		s = s.groups()[0]
		if s[0] == "#":
			s = s[1:]
			if s[0] in [&#039;x&#039;,&#039;X&#039;]:
				c = int(s[1:], 16)
			else:
				c = int(s)
			return unichr(c)
			
		else:
			return 
unichr(htmlentitydefs.name2codepoint[c])
			
	return re.sub(r"&(#?[xX]?(?:[0-9a-fA-F]+|\w{1,8}));", 
replaceEntities, s)



----------------------------------------------------------------------

>Comment By: Aaron Swartz (aaronsw)
Date: 2004-03-08 19:21

Message:
Logged In: YES 
user_id=122141

Oops. The replacement function is attached.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=912410&group_id=5470



More information about the Patches mailing list