[XML-SIG] lxml iterparse and comments
Stefan Behnel
stefan_ml at behnel.de
Mon Mar 24 08:33:53 CET 2008
Hi,
Stuart McGraw wrote:
> I am probably mising something elementary (I am new
> to both xml and lxml), but I am having problems figuring
> out how to get comments when using lxml's iterparse().
> When I parse xml with parse() and iterate though the
> result, I get the comments. But when I try to do the
> same thing (approximately I think) with iterparse,
> I don't see any comments.
While the comments end up in the tree that iterparse generates, they do not
show up in the events. Now that you mention it, I actually think that should
change. There should be events "comment" and "pi" that yield them if requested.
> I was using the standard Python ElementTree but my
> understanding is that it doesn't save comments at all.
ElementTree strips comments in the parser, that's right.
> The real file is ~50MB and has about 1M nodes under the
> root so I have to use iterparse and I also have to process
> comments, so I would really appreciate a clue about how
> to do it. Thanks.
Have you tried the parser target interface? It's a SAX-like interface that
uses callbacks.
http://codespeak.net/lxml/parsing.html#the-target-parser-interface
http://effbot.org/elementtree/elementtree-xmlparser.htm#the-target-interface
Stefan
More information about the XML-SIG
mailing list