using TreeBuilder in an ElementTree like way

Greg Aumann Greg_Aumann at sil.org
Wed Jun 28 01:45:19 EDT 2006


I am trying to write some python code for a library that reads an 
XML-like language from a file into elementtree data structures. Then I 
want to be able to read and/or modify the structure and then be able to 
write it out either as XML or in the original format. I really want the 
api for the XML-like language to be the same as the elementtree api to 
reduce confusion, ease of learning etc.

In reading the elementtree documentation I found the 
ElementTree.TreeBuilder class which it says can be used to create 
parsers for XML-like languages. So I wrote the code below. The code is 
working but I am not sure that this is really the intended way to use 
the ElementTree.TreeBuilder class.

Essentially I was trying to implement the following advice from Frederik 
Lundh (Wed, Sep 8 2004 12:54 am):
 > by the way, it's trivial to build trees from arbitrary SAX-style sources.
 > just create an instance of the ElementTree.TreeBuilder class, and call
 > the "start", "end", and "data" methods as appropriate.
 >
 >     builder = ElementTree.TreeBuilder()
 >     builder.start("tag", {})
 >     builder.data("text")
 >     builder.end("tag")
 >     elem = builder.close()

but in another post he wrote (Wed, May 21 2003 2:56 am):
 > usage:
 >
 >     from elementtree import ElementTree, HTMLTreeBuilder
 >
 >     # file is either a filename or an open stream
 >     tree = ElementTree.parse(file, parser=HTMLTreeBuilder.TreeBuilder())
 >     root = tree.getroot()
 >
 > or
 >
 >     from elementtree import HTMLTreeBuilder
 >
 >     parser = HTMLTreeBuilder.TreeBuilder()
 >     parser.feed(data)
 >     root = parser.close()

This second one makes me think I should have implemented a parser class 
using Treebuilder. Also when I used return builder.close() in the code 
below it didn't return an ElementTree structure but an _ElementInterface.

So my question is really about how I should structure the code so that 
it is as similar to use this XML format as to use XML itself in 
elementtree.

from elementtree import ElementTree
from nltk_lite.corpora.shoebox import ShoeboxFile

class Settings(ShoeboxFile):
     def __init__(self):
         super(Settings, self).__init__()

     def parse(self, encoding=None):
         builder = ElementTree.TreeBuilder()
         for mkr, value in self.fields(encoding, unwrap=False):
             block=mkr[0]
             if block in ("+", "-"):
                 mkr=mkr[1:]
             else:
                 block=None
             if block == "+":
                 builder.start(mkr, {})
                 builder.data(value)
             elif block == '-':
                 builder.end(mkr)
             else:
                 builder.start(mkr, {})
                 builder.data(value)
                 builder.end(mkr)
         return ElementTree.ElementTree(builder.close())



More information about the Python-list mailing list