[XML-SIG] [ pyxml-Bugs-497322 ] sgmlop and unicode charrefs
noreply@sourceforge.net
noreply@sourceforge.net
Fri, 28 Dec 2001 05:47:17 -0800
Bugs item #497322, was opened at 2001-12-28 05:47
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=497322&group_id=6473
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: Nobody/Anonymous (nobody)
Summary: sgmlop and unicode charrefs
Initial Comment:
sgmlop has a problem with unicode character references
when the handler doesn't implement handle_charref (and
sgmlop forwards the call to handle_data). The following
test script shows the problem:
----
from xml.parsers import sgmlop
class Handler:
def handle_data(self, data):
print "handle_data", repr(data)
class Handler2(Handler):
def handle_charref(self, data):
print "handle_charref", repr(data)
p = sgmlop.XMLParser()
p.register(Handler())
p.parse("€")
p.register(Handler2())
p.parse("€")
----
The output is the following:
----
handle_data '\xac'
handle_charref '8364'
----
i.e. parsing works with Handler2,
but not with Handler.
To fix this bug sgmlop has to return unicode objects.
(There's already a patch for that, see #412237 "sgmlop
returns Unicode")
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=497322&group_id=6473