[XML-SIG] well-formed xml
Mark McEahern
mark@mceahern.com
Thu, 26 Sep 2002 19:08:48 -0500
I'm obviously missing something because this seemingly innocent chunk of
xhtml:
from xml.dom import minidom
s = "<a href='http://google.com/search?hl=en&q=foobar'>search</a>"
# ^
# - seems to be the problem
#
# maybe it thinks I'm trying to reference the &q entity?
doc = minidom.parseString(s)
Exception traceback follows.
Is there a way for me to tell it to ignore apparent entity references inside
attribute values?
// m
$ python junk.py
Traceback (most recent call last):
File "junk.py", line 5, in ?
doc = minidom.parseString(s)
File "/usr/lib/python2.2/site-packages/_xmlplus/dom/minidom.py", line 965,
in
parseString
return _doparse(pulldom.parseString, args, kwargs)
File "/usr/lib/python2.2/site-packages/_xmlplus/dom/minidom.py", line 952,
in
_doparse
toktype, rootNode = events.getEvent()
File "/usr/lib/python2.2/site-packages/_xmlplus/dom/pulldom.py", line 256,
in
getEvent
self.parser.feed(buf)
File "/usr/lib/python2.2/site-packages/_xmlplus/sax/expatreader.py", line
148,
in feed
self._err_handler.fatalError(exc)
File "/usr/lib/python2.2/site-packages/_xmlplus/sax/handler.py", line 38,
in f
atalError
raise exception
xml.sax._exceptions.SAXParseException: <unknown>:1:41: not well-formed
(invalid
token)
This is with Python 2.2.1 without PyXML installed separately. The same
thing happens with PyXML 0.8.1:
$ python junk.py
Traceback (most recent call last):
File "junk.py", line 5, in ?
doc = minidom.parseString(s)
File "/usr/local/lib/python2.2/site-packages/_xmlplus/dom/minidom.py",
line 16
05, in parseString
return expatbuilder.parseString(string)
File
"/usr/local/lib/python2.2/site-packages/_xmlplus/dom/expatbuilder.py", li
ne 943, in parseString
return builder.parseString(string)
File
"/usr/local/lib/python2.2/site-packages/_xmlplus/dom/expatbuilder.py", li
ne 189, in parseString
parser.Parse(string, True)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1,
column 41
-