xml.sax module documentation

Mon Nov 13 09:26:34 EST 2000

"S. Hendry" <shendry at usa.capgemini.com> writes:

> Actually I should be more specific.  I may be confused what a parser is.

A parser is an algorithm that splits a text into pieces (called
tokens), and combines these pieces according to a grammar. It
eventually decides whether the input text is correct according to the
grammar (i.e. it accepts the text); in the process, it tells the application
what parts of the text is has seen.

Somebody already explained how a SAX parser works; the things it
reports are start and endtags, and the characters in-between.

> if PO.Approver = "John Smith":
>    special_discount = 10
> ...
> 
> Am I far off?

Somewhat off, yes. An XML processor can't work that way. For example,
there may be multiple Approver elements in a PO element; the API you
propose couldn't tell-apart the various instances. Likewise, it is not
clear how such an API would take attributes into account, e.g.

  <Qty unit="kg">10</Qty>

Since you'd use Python attributes already for the subelements, it'd be
hard to merge the XML attributes into that.

> If not, then how do I use the xml modules to do what I intend to do?

I suggest to build a DOM tree, using xml.dom.minidom.parse.
When I parse your example into a variable d, I can do

>>> s = """your document"""
>>> from xml.dom.minidom import parseString
>>> d = parseString(x)
>>> d
<xml.dom.minidom.Document instance at 2afa9c>
>>> d.getElementsByTagName("Approver")
[<DOM Element: Approver at 2832780>]
>>> d.getElementsByTagName("Approver")[0].firstChild
<DOM Text node "John Smith">
>>> d.getElementsByTagName("Approver")[0].firstChild.data
u'John Smith'

Regards,
Martin