Help parsing XML

Skip Montanaro skip at pobox.com
Mon Jul 2 03:53:20 EDT 2001


Warning: I am a complete XML novice trying to parse some XML I dreamed up
with xml.sax.parse.

I wrote a simple subclass of xml.sax.ContentHandler to parse some XML.  It
defines startElement, endElement and characters methods.  If I feed it
something simple like so:

    <events>
      <event>
	<performers>
	  <performer>James Taylor</performer>
	  <performer>Carly Simon</performer>
	</performers>
	<keywords>
	  <keyword>pop</keyword>
	  <keyword>rock</keyword>
	  <keyword>vocals</keyword>
	</keywords>
	<start-date>20010701T20:00</start-date>
	<end-date>20010701T23:00</end-date>
	<admission-price>$30</admission-price>
	<venue-name>Pepsi Arena</venue-name>
	<city>Albany</city>
	<state>CA</state>
	<country>US</country>
	<submitter-name>Skip Montanaro</submitter-name>
	<submitter-email>skip at mojam.com</submitter-email>
      </event>
    </events>

it works fine.  However, note the absence of <!DOCTYPE ...> and <?xml ...>
tags at the start.  If I add them at the start of the XML file like so:

    <!DOCTYPE events SYSTEM "xml-concerts.dtd">
    <?xml version="1.0" encoding-"ISO-8859-1"?>
    <events>
      <event>
      ...

Python squawks:

    Traceback (most recent call last):
      File "parsexml.py", line 72, in ?
	xml.sax.parse("concert.xml", h)
      File "/usr/local/lib/python2.1/xml/sax/__init__.py", line 33, in parse
	parser.parse(source)
      File "/usr/local/lib/python2.1/xml/sax/expatreader.py", line 43, in parse
	xmlreader.IncrementalParser.parse(self, source)
      File "/usr/local/lib/python2.1/xml/sax/xmlreader.py", line 123, in parse
	self.feed(buffer)
      File "/usr/local/lib/python2.1/xml/sax/expatreader.py", line 92, in feed
	self._err_handler.fatalError(exc)
      File "/usr/local/lib/python2.1/xml/sax/handler.py", line 38, in fatalError
	raise exception
    xml.sax._exceptions.SAXParseException: concert.xml:2:0: xml processing instruction not at start of external entity

If I zap the <!DOCTYPE ...> tag I get this:

    Traceback (most recent call last):
      File "parsexml.py", line 72, in ?
	xml.sax.parse("concert.xml", h)
      File "/usr/local/lib/python2.1/xml/sax/__init__.py", line 33, in parse
	parser.parse(source)
      File "/usr/local/lib/python2.1/xml/sax/expatreader.py", line 43, in parse
	xmlreader.IncrementalParser.parse(self, source)
      File "/usr/local/lib/python2.1/xml/sax/xmlreader.py", line 123, in parse
	self.feed(buffer)
      File "/usr/local/lib/python2.1/xml/sax/expatreader.py", line 92, in feed
	self._err_handler.fatalError(exc)
      File "/usr/local/lib/python2.1/xml/sax/handler.py", line 38, in fatalError
	raise exception
    xml.sax._exceptions.SAXParseException: concert.xml:1:41: syntax error

If you're wondering what sort of XML I'm trying to parse, my first cut at a
DTD (I'm a complete novice at DTD writing as well) is at

    http://musi-cal.mojam.com/~skip/concerts.dtd

I suspect this is more a problem with my lack of XML expertise than with
anything specific I'm doing wrong with my content handler.  Suggestions or
pointers to appropriate XML tutorial material would be appreciated.

Thanks,

-- 
Skip Montanaro (skip at pobox.com)
(847)971-7098




More information about the Python-list mailing list