Help parsing XML
Skip Montanaro
skip at pobox.com
Mon Jul 2 03:53:20 EDT 2001
Warning: I am a complete XML novice trying to parse some XML I dreamed up
with xml.sax.parse.
I wrote a simple subclass of xml.sax.ContentHandler to parse some XML. It
defines startElement, endElement and characters methods. If I feed it
something simple like so:
<events>
<event>
<performers>
<performer>James Taylor</performer>
<performer>Carly Simon</performer>
</performers>
<keywords>
<keyword>pop</keyword>
<keyword>rock</keyword>
<keyword>vocals</keyword>
</keywords>
<start-date>20010701T20:00</start-date>
<end-date>20010701T23:00</end-date>
<admission-price>$30</admission-price>
<venue-name>Pepsi Arena</venue-name>
<city>Albany</city>
<state>CA</state>
<country>US</country>
<submitter-name>Skip Montanaro</submitter-name>
<submitter-email>skip at mojam.com</submitter-email>
</event>
</events>
it works fine. However, note the absence of <!DOCTYPE ...> and <?xml ...>
tags at the start. If I add them at the start of the XML file like so:
<!DOCTYPE events SYSTEM "xml-concerts.dtd">
<?xml version="1.0" encoding-"ISO-8859-1"?>
<events>
<event>
...
Python squawks:
Traceback (most recent call last):
File "parsexml.py", line 72, in ?
xml.sax.parse("concert.xml", h)
File "/usr/local/lib/python2.1/xml/sax/__init__.py", line 33, in parse
parser.parse(source)
File "/usr/local/lib/python2.1/xml/sax/expatreader.py", line 43, in parse
xmlreader.IncrementalParser.parse(self, source)
File "/usr/local/lib/python2.1/xml/sax/xmlreader.py", line 123, in parse
self.feed(buffer)
File "/usr/local/lib/python2.1/xml/sax/expatreader.py", line 92, in feed
self._err_handler.fatalError(exc)
File "/usr/local/lib/python2.1/xml/sax/handler.py", line 38, in fatalError
raise exception
xml.sax._exceptions.SAXParseException: concert.xml:2:0: xml processing instruction not at start of external entity
If I zap the <!DOCTYPE ...> tag I get this:
Traceback (most recent call last):
File "parsexml.py", line 72, in ?
xml.sax.parse("concert.xml", h)
File "/usr/local/lib/python2.1/xml/sax/__init__.py", line 33, in parse
parser.parse(source)
File "/usr/local/lib/python2.1/xml/sax/expatreader.py", line 43, in parse
xmlreader.IncrementalParser.parse(self, source)
File "/usr/local/lib/python2.1/xml/sax/xmlreader.py", line 123, in parse
self.feed(buffer)
File "/usr/local/lib/python2.1/xml/sax/expatreader.py", line 92, in feed
self._err_handler.fatalError(exc)
File "/usr/local/lib/python2.1/xml/sax/handler.py", line 38, in fatalError
raise exception
xml.sax._exceptions.SAXParseException: concert.xml:1:41: syntax error
If you're wondering what sort of XML I'm trying to parse, my first cut at a
DTD (I'm a complete novice at DTD writing as well) is at
http://musi-cal.mojam.com/~skip/concerts.dtd
I suspect this is more a problem with my lack of XML expertise than with
anything specific I'm doing wrong with my content handler. Suggestions or
pointers to appropriate XML tutorial material would be appreciated.
Thanks,
--
Skip Montanaro (skip at pobox.com)
(847)971-7098
More information about the Python-list
mailing list