[XML-SIG] Showing off the power of Python for XML processing
Tue, 17 Mar 1998 14:08:42 +0100
On Tue, Mar 17, 1998 at 12:29:40PM +0000, Sean Mc Grath wrote:
> As anyone who has read the article in Dobbs in Feb. will know,
I did, and your book (Parseme.1st) too.
> I made a stab at inventing a native Python data structure for
> representing the tree structure of an XML document. I would
> like to see some discussion as to how best to expose this
> tree structure for Python applications. Obviously, DOM will
> be one interface but should we limit ourselves to it?
I my DOM package, I did represent the tree structure a bit differently than
you did (I use lists where you use pointers to previous and next sibling).
I guess that once an API has been chosen, the difference is minor.
> Whether we like it or not, developers selecting scripting
> languages for XML processing are going to perform line count
> comparisons. I think it would be great to be able to show
> how Python code can be a)succinct, b) understandable and
> c) maintainable for XML processing.
> Some arbitrary notions:-
> 1) Iterators
> In the Python article for Dobbs I provided a __getitem__
> at the XML tree level to allow:-
> for ANode in ATree:
> Do something
> Good or bad? Would it be better Pythoneze to create a list
> of nodes an iterate that?
> MyNodeList = ATree.GetDescendants()
> for n in MyNodeList:
> Do something
The order in which you traverse a tree is significant, there is no reason
(I guess) to promote one instead on another. I would rather use an iterator:
for node in tree.top_down_iterator():
> 2) Slice operations
> I think this is one of the areas where Python can really
> shine for XML processing. I write a lot of XML processing
> apps and a lot of the processing is driven by context:
> "if my parent is a SECT and my grandparent is a CHAP:
> if GetAncestors()[1:3] = ("SECT","CHAP"):
> do something
> A health collection of primitives for creating such lists
> combined with Pythons list processing, slicing functionality
> is *mouth watering*.
Do you mean:
(the difference is only syntactical, but means
> 3) Collection Processing
> Rarely do any of my XML processing apps stand alone. By
> that I mean that they tend to process a collection of XML docs.
> // Process all chap*.xml docs. Print data content of
> // foo elements
> for f in glob.glob ("chap*.xml"):
> for t in LoadXML(f):
> if t.AtElement ("FOO"):
> print GetDataDescendants()
> How best to do it?
Here's how you would do it in my current framwork
for f in glob.glob ("chap*.xml"):
p = XmlParser()
document = p.parse('', f)
# or: p.parse('', f); document = p.document
t = MyTranformer()
# or: document = t.tranform(document)
> 4) TreeApply
> One trick that I have found very useful is an apply() style
> helper function for trees.
> I have an XMLTreeApply helper function that walks an XML
> tree applying the supplied function to all the nodes in the tree.
> It proves particular useful for throwaway lambda functions
> XMLTreeApply (lamdba x:if x.AtElement("FOO"): print GetDataDescendants())
Yes, that's cool, but it exposes one of the drawback of Python wrt scheme:
you can't use instructions in lambda expressions.
Another option is to use a query fonction:
for node in document.query_descendants('this.GI == "FOO"'):
for node in document.query_descendants(lambda x: x.GI == 'FOO'):
(not implemented currently)
for node in document.getElementsByTagName('FOO'):
^-- this one in in the DOM core specs, and I guess the W3C is working on
an extension of this mecanism.
> 5) Exposing XML from non-XML data sources
> It is only a matter of time before relational databases and so on natively
> provide functionality to expose their data as XML. In the mean-time
> wouldn't it be useful if dbm, glob, pstats and even calendar exposed
One related point: I'm still working on the tranformation engine included
in my DOM package. I have two options: use XSL (i.e. compile XSL stylesheets
into Tranformer classes), but I personnally dislike XSL, or invent a new
What do you think ? (I can guess the answer: XSL is a standard, blah, blah,
but ECMAScript is the standard scripting language of XSL, and there is no
current implementation of ECMAScript in Python that I'm aware of. So even
Stéfane Fermigier, MdC à l'Université Paris 7. Tel: 01.44.27.61.01 (Bureau).
Mathematician, hacker, bassist. http://www.math.jussieu.fr/~fermigie/
"Life is good for only two things, discovering mathematics and teaching
mathematics." Siméon Poisson.