Expat XML Parser

Andrew Dalke dalke at dalkescientific.com
Thu Nov 29 12:04:27 EST 2001


Richard Boardman:
>I am trying to use the Expat XML parser to extract some data for
>processing - sounds simple enough - although I am having great difficulty
>finding documentation for Expat and am consequently very stuck.

You should be able to use any of the documentation on XML for
Python.  See http://pyxml.sourceforge.net/topics/ .  The interface
to use Expat or any of the other parsers is all generic enough
that you don't really have to worry about which one you're using.
You may also want to ask questions of the XML-SIG -- see
  http://www.python.org/sigs/xml-sig/

>What I'd like to do is have something that works thus:
>
>readInXML
>foreach element in XML
>    if element = "abcdefg" {
>        getCharacterData
>        doStuff with characterData
>    }

There are two common styles of XML processing, SAX and DOM.
What you have here is more DOM like, and can be done like
(excuse me since I don't do much DOM programming so some
of this is made up, and probably wrong, and probably
should be done with XSL )

parser = make_parser()
dh = xml.dom.sax_builder.SaxBuilder()
parser.setContentHandler(dh)
parser.parse(file)

for data_set in dh.document.getElementsByTagName("data"):
  if data_set.get_attrs("set") == "1":
    print data_set.get_firstChild().get_data()


The SAX-style, which I've used a lot, uses callbacks
to do everything, and looks like this:

parser = make_parser()
handler = make_callback_handler()
parser.setContentHandler(handler)
parser.parse(file)

and the 'make_callback_handler' is something like

class MyCallbackHandler(handler.ContentHandler):
  def startDocument(self):
    self.store_characters = 0
    self.text = None
    self.data_set = []
  def startElement(self, tag, attrs):
    if tag == "data" and attrs["set"] == "1":
      self.store_characters = 1
      self.text = ""
  def characters(self, text):
    if self.store_characters:
      self.text = self.text + text
  def endElement(self, tag):
    if self.store_characters:
      print "I have", self.text
      self.data_set.append(string.split(self.text))
      self.text = None
      self.store_characters = 0

The SAX style is what you were using in the code included
in your post.  What you were missing was using an object
to store the data.

                    Andrew






More information about the Python-list mailing list