[XML-SIG] Re: cElementTree.iterparse missing text in some startevents

Jimmy Retzlaff jimmy at retzlaff.com
Tue Jan 25 09:58:59 CET 2005


Fredrik Lundh wrote:
> 
> Jimmy Retzlaff wrote:
> 
> > I'm using cElementTree.iterparse to iterate over an XML file. I
think
> > iterparse is a wonderful idea - I've found it to be much more
convenient
> > than SAX for iterative processing. I have come across a problem
> > though...
> >
> > For the majority of my elements, both the start and end events
contain
> > the text of the element (i.e., element.text). For a handful of the
> > elements, the text is only in the end event (i.e., element.text is
None
> > in the start event but it is not None in the end event). The text is
> > found without any problem when using cElementTree.parse on the file
> > instead.
> 
> > Am I misunderstanding something or is this perhaps a bug?
> 
> it needs more documentation ;-)
> 
> here's what the comment in the CHANGES document says:
> 
>     The elem object is the current element; for "start" events,
>     the element itself has been created (including attributes), but
its
>     contents may not be complete; for "end" events, all child elements
>     has been processed as well.  You can use "start" tags to count
>     elements, check attributes, and check if certain tags are present
>     in a tree.  For all other purposes, use "end" handlers instead.
> 
> in that text, "may not" really means "may or may not".  that is, the
> contents may be complete, but that's nothing you can or should rely
on.
> 
> the reason for this is that events don't fire in perfect lockstep with
the
> build process; in the current version, the parser may be up to 16k
further
> ahead.

...

> clearer?

Yes, thanks! Just a thought... would it be better to artificially hide
the attributes that can't be counted on in a start event or are the
tradeoffs in doing so too ugly? With small elements like mine and a
buffer as large as 16KB then things will almost always be available in
the start event. That'll lead learn-by-trail-and-error folks (i.e.,
those of us who don't read :) to miss the distinction altogether. I was
lucky enough to have a unit test that noticed I had ~10 or so empty
values out of many thousands, but otherwise I wouldn't have known about
the problem (especially if empty values were occasionally expected).

Thanks for all the wonderful libraries.

Jimmy


More information about the XML-SIG mailing list