[XML-SIG] Re: cElementTree.iterparse missing text in some startevents
jimmy at retzlaff.com
Tue Jan 25 09:58:59 CET 2005
Fredrik Lundh wrote:
> Jimmy Retzlaff wrote:
> > I'm using cElementTree.iterparse to iterate over an XML file. I
> > iterparse is a wonderful idea - I've found it to be much more
> > than SAX for iterative processing. I have come across a problem
> > though...
> > For the majority of my elements, both the start and end events
> > the text of the element (i.e., element.text). For a handful of the
> > elements, the text is only in the end event (i.e., element.text is
> > in the start event but it is not None in the end event). The text is
> > found without any problem when using cElementTree.parse on the file
> > instead.
> > Am I misunderstanding something or is this perhaps a bug?
> it needs more documentation ;-)
> here's what the comment in the CHANGES document says:
> The elem object is the current element; for "start" events,
> the element itself has been created (including attributes), but
> contents may not be complete; for "end" events, all child elements
> has been processed as well. You can use "start" tags to count
> elements, check attributes, and check if certain tags are present
> in a tree. For all other purposes, use "end" handlers instead.
> in that text, "may not" really means "may or may not". that is, the
> contents may be complete, but that's nothing you can or should rely
> the reason for this is that events don't fire in perfect lockstep with
> build process; in the current version, the parser may be up to 16k
Yes, thanks! Just a thought... would it be better to artificially hide
the attributes that can't be counted on in a start event or are the
tradeoffs in doing so too ugly? With small elements like mine and a
buffer as large as 16KB then things will almost always be available in
the start event. That'll lead learn-by-trail-and-error folks (i.e.,
those of us who don't read :) to miss the distinction altogether. I was
lucky enough to have a unit test that noticed I had ~10 or so empty
values out of many thousands, but otherwise I wouldn't have known about
the problem (especially if empty values were occasionally expected).
Thanks for all the wonderful libraries.
More information about the XML-SIG