[Python-bugs-list] [ python-Bugs-803422 ] sgmllib doesn't support
hex or Unicode character references
SourceForge.net
noreply at sourceforge.net
Wed Sep 10 16:42:42 EDT 2003
Bugs item #803422, was opened at 2003-09-09 15:53
Message generated for change (Comment added) made by aaronsw
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=803422&group_id=5470
Category: Python Library
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Aaron Swartz (aaronsw)
Assigned to: Nobody/Anonymous (nobody)
Summary: sgmllib doesn't support hex or Unicode character references
Initial Comment:
sgmllib doesn't support the hexadecimal style of character nor
Unicode characters, both of which are commonly seen on web pages.
The following replacements fix both problems.
charref = re.compile('&#([0-9a-fA-F]+)[^0-9a-fA-F]')
def handle_charref(self, ref):
try:
if ref[0] == 'x' or ref[0] == 'X': m =
int(ref[1:], 16)
else: m = int(ref)
self.handle_data(unichr(m).encode('utf-8'))
except ValueError:
self.unknown_charref(ref)
----------------------------------------------------------------------
>Comment By: Aaron Swartz (aaronsw)
Date: 2003-09-10 17:42
Message:
Logged In: YES
user_id=122141
I don't have the money to shell out for the XML spec, but according to http://
developers.omnimark.com/documentation/concept/764.htm they were
added in SGML TC 2.
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2003-09-10 11:58
Message:
Logged In: YES
user_id=21627
Are you sure hexadecimal character references are part of
the SGML standard?
----------------------------------------------------------------------
Comment By: Aaron Swartz (aaronsw)
Date: 2003-09-09 16:00
Message:
Logged In: YES
user_id=122141
Oops, that should be:
charref = re.compile('&#([0-9a-fA-FxX][0-9a-fA-F]*)[^0-9a-fA-F]')
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=803422&group_id=5470
More information about the Python-bugs-list
mailing list