[XML-SIG] lxml iterparse and comments

Stefan Behnel stefan_ml at behnel.de
Mon Mar 24 08:33:53 CET 2008


Hi,

Stuart McGraw wrote:
> I am probably mising something elementary (I am new
> to both xml and lxml), but I am having problems figuring 
> out how to get comments when using lxml's iterparse().  
> When I parse xml with parse() and iterate though the 
> result, I get the comments.  But when I try to do the
> same thing (approximately I think) with iterparse, 
> I don't see any comments.

While the comments end up in the tree that iterparse generates, they do not
show up in the events. Now that you mention it, I actually think that should
change. There should be events "comment" and "pi" that yield them if requested.


> I was using the standard Python ElementTree but my 
> understanding is that it doesn't save comments at all.

ElementTree strips comments in the parser, that's right.


> The real file is ~50MB and has about 1M nodes under the 
> root so I have to use iterparse and I also have to process 
> comments, so I would really appreciate a clue about how 
> to do it.  Thanks.

Have you tried the parser target interface? It's a SAX-like interface that
uses callbacks.

http://codespeak.net/lxml/parsing.html#the-target-parser-interface
http://effbot.org/elementtree/elementtree-xmlparser.htm#the-target-interface

Stefan


More information about the XML-SIG mailing list