[XML-SIG] lxml iterparse and comments

Stuart McGraw smcg4191 at frii.com
Mon Mar 24 04:56:59 CET 2008


I am probably mising something elementary (I am new
to both xml and lxml), but I am having problems figuring 
out how to get comments when using lxml's iterparse().  
When I parse xml with parse() and iterate though the 
result, I get the comments.  But when I try to do the
same thing (approximately I think) with iterparse, 
I don't see any comments.  See example code below.  
(lxml-2.02, Python-2.5.1)

(I was using the standard Python ElementTree but my 
understanding is that it doesn't save comments at all.  
If that's wrong I would go back to using it).

The real file is ~50MB and has about 1M nodes under the 
root so I have to use iterparse and I also have to process 
comments, so I would really appreciate a clue about how 
to do it.  Thanks.

Example code:
import lxml.etree as ET
from cStringIO import StringIO

# XML data...
xmltxt = \
'''<?xml version="1.0" encoding="UTF-8"?>
<!-- Rev 1.06 
<!DOCTYPE Test [
<!ELEMENT Test (entry*)>
<!--                                                                   -->
<!ELEMENT entry ANY>
	<!-- Description of <entry> element.
<!-- File created: 2008-02-27 -->
<!--  Chronosynclastic Infindibulum Listing -->
<entry>text 1</entry>
<!-- Deleted:  A1500477 -->
<entry>text 2</entry>

print 'Parse:\n------'
et = ET.parse( StringIO (xmltxt))
for elem in et.iter():
    print elem

print '\nIterparse:\n----------'
xx = ET.iterparse( StringIO (xmltxt), ("start","end"))
for event, elem in iter(xx):
    print event, elem

More information about the XML-SIG mailing list