[ expat-Bugs-575168 ] Missing events for end-element

noreply@sourceforge.net noreply@sourceforge.net
Tue Jul 2 11:48:15 2002


Bugs item #575168, was opened at 2002-06-28 15:34
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=575168&group_id=10127

Category: None
Group: None
>Status: Closed
Resolution: Fixed
Priority: 7
Submitted By: Fred L. Drake, Jr. (fdrake)
Assigned to: Karl Waclawek (kwaclaw)
Summary: Missing events for end-element

Initial Comment:
When the end-element handler is set but not the
start-element handler, the end-element events for end
tags are not reported; events are reported for empty
elements.

I've attached a test case that demonstrates this problem.

----------------------------------------------------------------------

>Comment By: Karl Waclawek (kwaclaw)
Date: 2002-07-02 14:46

Message:
Logged In: YES 
user_id=290026

Additional patch checked in.

Karl

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-07-01 11:30

Message:
Logged In: YES 
user_id=3066

Karl, the additional patch looks good to me; go ahead and
check it in, and I'll finish up the hard-tab removal task.

Re-close the bug report when you're ready.

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-06-29 23:46

Message:
Logged In: YES 
user_id=290026

I attached a patch (based on your patch - xmlparse.c 1.46),
that always sets up the tag structure, as suggested.
Additionally, I made sure that storeAtts doesn't get
called twice - I moved it under if (startElementHandler) ...

That way we should have correct behaviour, even if the
endElementHandler gets set after the startElementHandler
was called. Since tag->name.str should now always be
non-NULL, I removed the corresponding check.

Karl

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-06-29 12:26

Message:
Logged In: YES 
user_id=290026

I agree with your analysis, also with the need
for checking other handler pairs.

One quick idea, without looking at the source:

If we remove this check:
...
if (startElementHandler || endElementHandler) {
...
then the tag structure gets always set up  and we
would not need the extra check for endElementHandler.
At the cost of extra processing time, but how
often does one *not* set any of the element handlers?

Karl

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-06-29 11:17

Message:
Logged In: YES 
user_id=3066

Ok, I've thought about that a bit, and the tests pass with
that change, but I think that means we're missing a test.

I suspect the current implementation largely assumes that
once we start parsing, the set of handlers will not grow;
while this may be reasonable application behavior, this is a
bad assumption to make in library code.

Consider this: some processor watches for a processing
instruction, and *adds* handlers based on that.  Suppose
that it adds start/end element handlers where it had none
before, and the document looks like this:

<doc>
    <?mytool start?>
    <inner/>
</doc>

If we make the change you suggest, we could be passing bad
data (NULL for the tagName) to the endElementHandler for the
doc element, though all will be fine for the inner element.
 With the extra check in there, it means we don't call the
endElementHandler for the doc element, which, while a bug,
means we're preserving existing behavior instead of passing
NULL (not documented as a valid value for the tagName).

I think we should leave the extra check in for now, but
should consider always setting up the TAG structure, even
though it means extra work for the rare application that
doesn't use either the start or end element handler.  The
data is still potentially needed for any application that
add an endElementHandler after processing starts.

There are probably similar cases for paired handlers
elsewhere in the code as well.

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-06-28 19:51

Message:
Logged In: YES 
user_id=290026

Should it not now be possible to change


...
  if (endElementHandler && tag->name.str) {
...

to

...
  if (endElementHandler) {
...

since the code should guarantee that the correct
tag is accessed, which means that tag->name.str != NULL?

Karl

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-06-28 19:06

Message:
Logged In: YES 
user_id=3066

I've gone ahead and checked in the patches as lib/xmlparse.c
1.46 and tests/runtests.c 1.23.

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-06-28 17:19

Message:
Logged In: YES 
user_id=3066

I've attached a minimal patch to make Expat generate the
required events.  The attached test passes when this patch
has been applied as well.  The tests of the Python bindings
that originally uncovered this bug also pass with this
change to xmlparse.c.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=575168&group_id=10127