[XML-SIG] Exceptions on undefined character entities
Frank McIngvale
frankm@HiWAAY.net
Fri, 1 Feb 2002 08:37:12 -0600 (CST)
Hi, I stumbled across this while fetching my usual
rdf/rss files yesterday, and am hoping someone can
explain what is happening:
newsforge.com gave me a file containing this line:
<title>University of Osnabrück, Germany</title>
minidom doesn't like it:
Python 2.1.1 (#1, Jan 21 2002, 22:52:28)
[GCC 2.95.3 20010315 (release)] on linux2
Type "copyright", "credits" or "license" for more information.
>>> from xml.dom import minidom
>>> s = "<title>University of Osnabrück, Germany</title>"
>>> minidom.parseString(s)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/lib/python2.1/xml/dom/minidom.py", line 915, in parseString
return _doparse(pulldom.parseString, args, kwargs)
File "/usr/lib/python2.1/xml/dom/minidom.py", line 902, in _doparse
toktype, rootNode = events.getEvent()
File "/usr/lib/python2.1/xml/dom/pulldom.py", line 234, in getEvent
self.parser.feed(buf)
File "/usr/lib/python2.1/xml/sax/expatreader.py", line 92, in feed
self._err_handler.fatalError(exc)
File "/usr/lib/python2.1/xml/sax/handler.py", line 38, in fatalError
raise exception
xml.sax._exceptions.SAXParseException: <unknown>:1:27: undefined entity
>>>
Dr. David Mertz pointed out that this works:
>>> s = "<!DOCTYPE title [<!ENTITY uuml '[fakechar]'>]><title>University
of Osnabrück, Germany</title>"
>>> minidom.parseString(s)
<xml.dom.minidom.Document instance at 0x81571c4>
>>>
So my question is, what is the correct way to handle this? Is
minidom supposed to handle it, is the caller supposed to provide
the entities, or is it a bug in the XML file?
thanks!
frank (please cc: me on replies, I'm not subscribed yet)