[XML-SIG] Showing off the power of Python for XML processing

Sean Mc Grath digitome@iol.ie
Tue, 17 Mar 1998 12:29:40 GMT


First up - I am delighted to see this list come into being!
I look forward to plenty of traffic on this list.

As anyone who has read the article in Dobbs in Feb. will know,
I made a stab at inventing a native Python data structure for
representing the tree structure of an XML document. I would
like to see some discussion as to how best to expose this
tree structure for Python applications. Obviously, DOM will
be one interface but should we limit ourselves to it?

Whether we like it or not, developers selecting scripting
languages for XML processing are going to perform line count
comparisons. I think it would be great to be able to show
how Python code can be  a)succinct, b) understandable and
c) maintainable for XML processing.

Some arbitrary notions:-

1) Iterators

In the Python article for Dobbs I provided a __getitem__
at the XML tree level to allow:-

        for ANode in ATree:
                Do something

Good or bad? Would it be better Pythoneze to create a list
of nodes an iterate that?
        MyNodeList = ATree.GetDescendants()
        for n in MyNodeList:
                Do something

2) Slice operations

I think this is one of the areas where Python can really
shine for XML processing. I write a lot of XML processing
apps and a lot of the processing is driven by context:

"if my parent is a SECT and my grandparent is a CHAP:

if GetAncestors()[1:3] = ("SECT","CHAP"):
        do something

A health collection of primitives for creating such lists
combined with Pythons list processing, slicing functionality
is *mouth watering*.

3) Collection Processing

Rarely do any of my XML processing apps stand alone. By
that I mean that they tend to process a collection of XML docs.

// Process all chap*.xml docs. Print data content of 
// foo elements

for f in glob.glob ("chap*.xml"):
        for t in LoadXML(f):
                if t.AtElement ("FOO"):
                        print GetDataDescendants()

How best to do it?

4) TreeApply

One trick that I have found very useful is an apply() style
helper function for trees.

I have an XMLTreeApply helper function that walks an XML
tree applying the supplied function to all the nodes in the tree.
It proves particular useful for throwaway lambda functions

XMLTreeApply (lamdba x:if x.AtElement("FOO"): print GetDataDescendants())

5) Exposing XML from non-XML data sources

It is only a matter of time before relational databases and so on natively
provide functionality to expose their data as XML. In the mean-time
wouldn't it be useful if dbm, glob, pstats and even calendar exposed
XML?


Comments???

Sean