Expat XML Parser

Martin von Loewis loewis at informatik.hu-berlin.de
Thu Nov 29 17:32:07 CET 2001


"Richard Boardman" <rpb at soton.ac.uk> writes:

> The problem major is that I can't seem to return any of these values at
> all - they will all print on the screen, but I can't actually *do* anything
> with these values. I don't think it's anything to do with Expat; more my
> lack of experience with this language. I can't find any documentation
> explaining how Expat works.

Expat works in an event-driven manner: For each chunk of the XML
document, it invokes a function passing the data it has read. Those
functions don't return anything (their return value is ignored); they
must do all processing before they return.

That processing could be to print the contents out, or it could be to
set some global variables to some values, for later inspection.

> What I'd like to do is have something that works thus:
> 
> readInXML

That is the source of confusion. In event-driven XML processing, there
is no separate "read-in"-step. The document is processed while being
read; once reading is complete, the processing must be done also.

What you want is that reading returns some data structure to inspect.

For that, I recommend to use the DOM. To read the document, do

document = xml.dom.minidom.parse(url-of-document)

> foreach element in XML
>     if element = "abcdefg" {

This is written as

for element in document.getElementsByTagName("abcdefg"):

>         getCharacterData

This is more tricky: the content of element may be other elements; or
it may be multiple text nodes (e.g. resulting from CDATA sections):

          chardata = ""
          for child in element.childNodes:
            if child.nodeType in [Node.TEXT_NODE, Node.CDATA_SECTION_NODE]:
               chardata += child.data

If you know there ain't any CDATA sections, and no comments,
processing instructions etc inside the text, you could also invoke
.normalize() first.

>         doStuff with characterData

          doStuff(chardata)

HTH,
Martin



More information about the Python-list mailing list