using TreeBuilder in an ElementTree like way
Fredrik Lundh
fredrik at pythonware.com
Sat Jul 1 05:28:33 EDT 2006
Greg Aumann wrote:
> In reading the elementtree documentation I found the
> ElementTree.TreeBuilder class which it says can be used to create
> parsers for XML-like languages.
a TreeBuilder is a thing that turns a sequence of start(), data(), and
end() method calls into an Element tree structure.
a Parser is a think that turns a sequence of feed() method calls into a
stream of start(), data(), and end() method calls on a target object.
the standard parsers all automatically uses a TreeBuilder instance as
the default target.
unfortunately, the current ET release uses classes named XXXTreeBuilder
also for the actual parsers, which is a bit confusing. (the reason for
this is historical; the separate TreeBuilder class is factored out from
a couple of format-specific XXXTreeBuilder parsers, but the naming
wasn't fully sorted out).
> Essentially I was trying to implement the following advice from Frederik
> Lundh (Wed, Sep 8 2004 12:54 am):
> > by the way, it's trivial to build trees from arbitrary SAX-style sources.
> > just create an instance of the ElementTree.TreeBuilder class, and call
> > the "start", "end", and "data" methods as appropriate.
> >
> > builder = ElementTree.TreeBuilder()
> > builder.start("tag", {})
> > builder.data("text")
> > builder.end("tag")
> > elem = builder.close()
that's the intended use of the TreeBuilder class.
> but in another post he wrote (Wed, May 21 2003 2:56 am):
> > usage:
> >
> > from elementtree import ElementTree, HTMLTreeBuilder
> >
> > # file is either a filename or an open stream
> > tree = ElementTree.parse(file, parser=HTMLTreeBuilder.TreeBuilder())
> > root = tree.getroot()
> >
> > or
> >
> > from elementtree import HTMLTreeBuilder
> >
> > parser = HTMLTreeBuilder.TreeBuilder()
> > parser.feed(data)
> > root = parser.close()
and this is the confusing naming; here, the HTMLTreeBuilder.TreeBuilder
class is actually doing the parsing (which uses a TreeBuilder instance
on the inside).
> This second one makes me think I should have implemented a parser class
> using Treebuilder.
that's entirely up to you: the only real advantage of having a parser
class is that you can pass it to any other module that uses the Python
consumer interface:
http://effbot.org/zone/consumer.htm
but if that's not relevant for your application, feel free to use a
TreeBuilder directly.
> Also when I used return builder.close() in the code below it didn't return
> an ElementTree structure but an _ElementInterface.
an Element, in other words (i.e. the thing returned by the Element
factory in this specific implementation). that's the documented
behaviour; if you want an ElementTree wrapper, you have to wrap it yourself.
</F>
More information about the Python-list
mailing list