[Tutor] XML parsing
Danny Yoo
dyoo@hkn.eecs.berkeley.edu
Tue, 6 Feb 2001 22:04:30 -0800 (PST)
On Wed, 7 Feb 2001, Suzanne Little wrote:
> Which module should I be using to do this? Are there any examples of
> this sort of scanning-of-xml-documents-for-information available for
> me to look at?
As a side project, I'm beginning to study the expat parser; it's pretty
neat. Here's a small example that uses Expat:
###
class MyXMLParser2:
def __init__(self):
self.parser = expat.ParserCreate()
self.parser.StartElementHandler = self.StartElementHandler
self.parser.EndElementHandler = self.EndElementHandler
self.parser.CharacterDataHandler = self.CharacterDataHandler
def feed(self, str):
self.parser.Parse(str)
def StartElementHandler(self, name, attributes):
print "Starting: ", name, attributes
def EndElementHandler(self, name):
print "Ending: ", name
def CharacterDataHandler(self, data):
print "Character data:", data
def test():
p = MyXMLParser2()
p.feed("""
<iq id='A0' type='get'><query
xmlns='jabber:iq:auth'><paragraph><username>bbaggins<boldface>Bilbo</boldface>
Baggins</username></paragraph></query></iq>
""")
if __name__ == '__main__':
test()
###
The idea is that whenever we let our parser look at something, it will
"call back" functions whenever it sees something that interests us. For
example, as soon as the parser sees:
<iq id='A0' type='get'>
it realizes that it sees the start of a new tag, so that's when the
StartElementHandler callback executes. Similar things happen when it sees
an end tag or character data. Try playing around with the program above,
and it should make things more clear.
There's some documentation about Expat here:
http://python.org/doc/current/lib/module-xml.parsers.expat.html
but it is, admittedly, a little terse. If I find anything more
accessible, I'll post to the list again. Good luck!