[Python-bugs-list] [ python-Bugs-803422 ] sgmllib doesn't support hex or Unicode character references

SourceForge.net noreply at sourceforge.net
Wed Sep 10 16:42:42 EDT 2003


Bugs item #803422, was opened at 2003-09-09 15:53
Message generated for change (Comment added) made by aaronsw
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=803422&group_id=5470

Category: Python Library
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Aaron Swartz (aaronsw)
Assigned to: Nobody/Anonymous (nobody)
Summary: sgmllib doesn't support hex or Unicode character references

Initial Comment:
sgmllib doesn't support the hexadecimal style of character nor 

Unicode characters, both of which are commonly seen on web pages. 

The following replacements fix both problems.



charref = re.compile('&#([0-9a-fA-F]+)[^0-9a-fA-F]')



	def handle_charref(self, ref):

		try:

			if ref[0] == 'x' or ref[0] == 'X': m = 

int(ref[1:], 16)

			else: m = int(ref)

			self.handle_data(unichr(m).encode('utf-8'))

		except ValueError:

			self.unknown_charref(ref)



----------------------------------------------------------------------

>Comment By: Aaron Swartz (aaronsw)
Date: 2003-09-10 17:42

Message:
Logged In: YES 
user_id=122141

I don't have the money to shell out for the XML spec, but according to http://

developers.omnimark.com/documentation/concept/764.htm they were 

added in SGML TC 2.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2003-09-10 11:58

Message:
Logged In: YES 
user_id=21627

Are you sure hexadecimal character references are part of

the SGML standard?

----------------------------------------------------------------------

Comment By: Aaron Swartz (aaronsw)
Date: 2003-09-09 16:00

Message:
Logged In: YES 
user_id=122141

Oops, that should be: 



charref = re.compile('&#([0-9a-fA-FxX][0-9a-fA-F]*)[^0-9a-fA-F]')

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=803422&group_id=5470



More information about the Python-bugs-list mailing list