well-formed xml

Mark McEahern mark at mceahern.com
Fri Sep 27 02:08:48 CEST 2002


I'm obviously missing something because this seemingly innocent chunk of
xhtml:

  from xml.dom import minidom

  s = "<a href='http://google.com/search?hl=en&q=foobar'>search</a>"
  #                                             ^
  #                                             - seems to be the problem
  #
  # maybe it thinks I'm trying to reference the &q entity?

  doc = minidom.parseString(s)

Exception traceback follows.

Is there a way for me to tell it to ignore apparent entity references inside
attribute values?

// m

$ python junk.py
Traceback (most recent call last):
  File "junk.py", line 5, in ?
    doc = minidom.parseString(s)
  File "/usr/lib/python2.2/site-packages/_xmlplus/dom/minidom.py", line 965,
in
parseString
    return _doparse(pulldom.parseString, args, kwargs)
  File "/usr/lib/python2.2/site-packages/_xmlplus/dom/minidom.py", line 952,
in
_doparse
    toktype, rootNode = events.getEvent()
  File "/usr/lib/python2.2/site-packages/_xmlplus/dom/pulldom.py", line 256,
in
getEvent
    self.parser.feed(buf)
  File "/usr/lib/python2.2/site-packages/_xmlplus/sax/expatreader.py", line
148,
 in feed
    self._err_handler.fatalError(exc)
  File "/usr/lib/python2.2/site-packages/_xmlplus/sax/handler.py", line 38,
in f
atalError
    raise exception
xml.sax._exceptions.SAXParseException: <unknown>:1:41: not well-formed
(invalid
token)

This is with Python 2.2.1 without PyXML installed separately.  The same
thing happens with PyXML 0.8.1:

$ python junk.py
Traceback (most recent call last):
  File "junk.py", line 5, in ?
    doc = minidom.parseString(s)
  File "/usr/local/lib/python2.2/site-packages/_xmlplus/dom/minidom.py",
line 16
05, in parseString
    return expatbuilder.parseString(string)
  File
"/usr/local/lib/python2.2/site-packages/_xmlplus/dom/expatbuilder.py", li
ne 943, in parseString
    return builder.parseString(string)
  File
"/usr/local/lib/python2.2/site-packages/_xmlplus/dom/expatbuilder.py", li
ne 189, in parseString
    parser.Parse(string, True)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1,
column 41

-





More information about the Python-list mailing list