[ expat-Patches-551599 ] Patch for # 544679, 548690, 549014, 551852, 553318

noreply@sourceforge.net noreply@sourceforge.net
Thu May 30 11:34:05 2002


Patches item #551599, was opened at 2002-05-02 16:38
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=310127&aid=551599&group_id=10127

Category: None
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Karl Waclawek (kwaclaw)
Assigned to: Nobody/Anonymous (nobody)
Summary: Patch for # 544679, 548690, 549014, 551852, 553318

Initial Comment:
This is an extensive patch, and needs not only 
testing, but also a review by my fellow developers!

It addresses these issues:

# 483514 (default handler reports too much from DTD):
    only the following data are reported now: 
    - ignored DTD sections
    - unhandled external parameter entity references
    - top level whitespace
    This may not be what is needed, but more would
    have been a lot of work

# 544679, 548690 (DTD handling of external entities): 
   - the storeEntityValue function has been modified
     to call the external entity reference handler, 
     since it did not handle external PE references
     at all
    - a new STRING_POOL has been added that gets
      passed from a parent to a child parser
     (member of DTD structure), so that entity values
      can be built across parsers 
    - new functions entityValueInitProcessor and 
      entityValueProcessor have been added
    - the old usage of dtd.complete was completely
      changed - I never understood how it worked,
      and it didn't work properly anyway;
      therefore there is a danger now that some logic
      will not work anymore - please review and check
    - the usage of hadExternalDoctype was modified too
    - there have been changes in xmlrole.c too - the
      chain of state handlers was extended from 
      entity9 to entity10 - in analogy to how general 
      entities - please review the diff file 
    - as a result, this patch now processes all
      of James Clark's test cases in the
      /valid/not-sa and /valid/ext-sa directories
      properly

* I have also made some fixes to the recently
  introduced XML_ParserReset function (incl. 
  changing it's return type), and one fix that
  prevents a null pointer error

The patch is based on these revisions:
- expat.h rev. 1.17
- xmlparse.c rev. 1.31
- xmlrole.c rev. 1.5
- xmlrole.h rev. 1.3

I have included the full patch files, 
an annotated version of xmlparse.c - good
to understand my changes, and the diff files.

Karl


----------------------------------------------------------------------

>Comment By: Karl Waclawek (kwaclaw)
Date: 2002-05-30 14:33

Message:
Logged In: YES 
user_id=290026

This patch has been checked in.
Closed, since no complaints have been received.

Karl



----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-05-23 15:46

Message:
Logged In: YES 
user_id=290026

OK, I deleted a few, but not all.
I think it is important to be able to
trace through the changes I made.
(at least for me, this is where I would go
 to check what I did and why I did it)

Also, the annotated versions explain a lot
of the patch, and these explanations are
not present in the final files.

Karl

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-05-23 15:35

Message:
Logged In: YES 
user_id=3066

Karl, can you delete the attachments that are no longer
current?  There are just too many to filter through!  Thanks!

Does your CVS client not allow you to create a single patch
file with diffs for all changed files?  The typical
command-line client let's me say "cvs diff" at the top of
the tree and I get a single patch output on standard output.

----------------------------------------------------------------------

Comment By: Rolf Ade (pointsman)
Date: 2002-05-19 17:10

Message:
Logged In: YES 
user_id=13222

I've done a lot of testing, with expat CVS
HEAD and this patch applied (and had some
pleasant mail conversation with Karl,
about the nifty details and the findings.)

Let me first confirm, that I was able to
reproduce the following bugs related to
this patch:

548690 and 544679 (They describe the same
behavior.)

549014 (I've already submitted comments to
this bug)

544679 

There's also a known bug (for which I
wasn't able to find the bug nr, sorry),
about XML_EntityDeclHandler() is not
reporting declarations of external PE
references. This I also have to confirm
(examples are
xmltest/valid/not-sa/005.xml, 011.xml,
012.xml, 026.xml out of the below
mentioned test suite).


Can't say something about the bugs 553318
and 549014 because I haven't tried to
confirm the described bugs.


Summary of my findings:

The patch seems to fix at least 548690,
544679, 549014, 544679 and the problem
with not reported declaration of external
PE references. I doesn't found any new
problem introduced by the patch. See below
for a description of what I've done.

That said, I still have an objection. With
this patch applied, there is no way to
detect, if a parameter entity is not
declared, even if you read all external
parameter entities, because now expat skip
silently all not resolvable entities, if a
document has an external subset or
external parameter entities. Bug 552297
would be a way, to handle this.

What I've done:

I've mainly used the OASIS XML test suite,
second edtion. You'll find it at
http://www.oasis-open.org/committees/xml-conformance/suite-v1se/xmlconf-20010315.htm

This testsuite includes several
testpackages, contributed by different
parties. Most of them split the tests in
not-wellformed, valid, and in-valid
tests. With a few simple shell scripts and
xmlwf (with the option -p), I've checked
all not-wellformed tests (all should
produce an error) and all valid test
(since they are a valid, the must be
well-formed and therefor should all pass
without an error.) Only in single cases
I've checked, if the error reason, given
by xmlwf is the one, the OASIS test suite
expects; for most of the not-wellformed
tests, I could only claim, that xmlwf does
report an error, but could not say, if
it's the 'right' error. I've run my test
scripts with expat-1.92, with expat CVS
HEAD, and with expat CVS HEAD with Karl's
patch added.

Expat 1.95.2 and expat CVS HEAD had the
same results:

xmltest/valid/not-sa/003.xml
xmltest/valid/not-sa/004.xml
xmltest/valid/not-sa/031.xml
ibm/not-wf/misc/432gewf.xml
sun/not-wf/uri01.xml

where wrong. 003.xml, 004.xml and 031.xml
are examples for bug 544679.

The problem with 432gwf.xml is, that this
file includes the entity declaration

 <!ENTITY gewithElemnetDecl "<!ELEMENT bogus ANY>">

The replace text does not conform to
production 79 (see XML rec 4.3.2). The
entity isn't actually used, in the
document (if it would be used, expat would
report an error. It seems to me, expat
does not check the replace text angainst
production 79 at declaration time. This is
a real problem, but more a minor one.

The problem with uri01.xml is, that this
document uses a fragment identifier in the
SYSTEM of an external entity
declaration. This is in deed not allowed
(see XML rec 4.2.2), but the rec say:
"[A]n XML processor may signal an error if
a fragment identifier is given as part of
a system identifier". So, first, this is
"may" and not "must" and, second, since
you have to provide your external entity
resolver by yourself, with the expat
library, this could be handled at
application level, if the programmer want
this. Therefor I would rank this as a very
minor problem, if ever.

Expat with Karl's patch had the results:

xmltest/not-wf/not-sa/005.xml
ibm/not-wf/misc/432gewf.xml
sun/not-wf/uri01.xml

003.xml, 004.xml and 031.xml from above
are fixed. 005.xml is a questionable
test. It uses in an external parameter
entity a not declared parameter
entity. The test suite expects the parser,
to claim about this. But since this is a
well-formedness test and the document has
references to external parameter entities,
the "Well-Formedness Constraint: Entity
Declared" of production 68 does not
apply. Therefor, Karl argues that this
test tests the "Validity Constraint:
Entity Declared" of production 68 and
69. He has already written a mail about
this to the OASIS XML compliance group
about this. (I personally still feel a bit
uncomfortable about this. Since Karl has
the spec on his side (if you read it 
literal,

The oasis testpackage of the testsuite
is not organized along this not-wf, valid
and not-valid scheme. Instead, there are a
couple of "fail" and "pass" tests.

All "pass" tests passes xmlwf without
error. The "fail" tests

p06fail1.xml
p08fail1.xml
p08fail2.xml
p16fail3.xml

doesn't trigger an error. But all four
tests are testing problems, that are only
detected by validating parsers (the test
files notices that explicitly).

Additional, I've compared the output
produced by xmlwf, if you use -d option
for all valid tests, for which an expected
output was given. Due to a few differences
between the canonical output, expected by
the OASIS test suite, and the canonnical
XML output, produced by xmlwf (which is, I
suspect, 'James Clark canonical XML', see
http://jclark.com/xml/canonxml.html), I
had to check some cases by hand.

Expat-1.95.2, expat HEAD and expat with
Karl's patch produced byte-identical
output. While comparing the expat output
with what the OASIS test suites expect, I
found three interesting cases. 


xmltest/valid/sa/098.xml

This test includes a processing
instruction with a DOS end-of-line
sequence (0d 0a). In the expat output this
end-of-line marker is 'normalized' to the
XML standard (only 0a). The expected
output still has the 0d 0a from the
document. 

ibm/valid/P02/ibm02v01.xml

This is a crazy case. I argue, that this
test is not well-formed XML, because it
includes an illegal UTF-8 sequenz. xmlwf
passes the document, but the xmlwf result
differs from the expected result. I'm not
surprised, that expat does not detect the
ill-formed UTF-8 (see bug 477667). (Well,
I'm a bit surprised, that the test is
included als valid test..)

sun/valid/sa02.xml

The problem is a difference in attribute
value normalization (interesting enought,
expat seems to follow E70 of
http://www.w3.org/XML/xml-19980210-errata
for the handling of #xD and #xA.)



To check the mentioned problem with
the reporting of declaration of external
parameter entity references, I've used the
expat binding from
http://sdf.lonestar.org/~loewerj/tdom.cgi

xmltest/valid/not-sa/005.xml
xmltest/valid/not-sa/011.xml
xmltest/valid/not-sa/012.xml
xmltest/valid/not-sa/026.xml 

are examples which showed, that expat
1.95.2 and expat CVS sometimes doesn't
report entity declarations via
XML_EntityDeclHandler. With Karls patch
applied, the declarations get reported.

I've an application, that is able to
trigger the bug 549014.  Using the memory
debugger mpatrol (
http://www.cbmamiga.demon.co.uk/mpatrol/ )
I could show, that the seg fault happens
at the place, the analysis to 549014
explains. After applying Karl's patch, the
seg fault is vanished.

Don't hesitate to ask, If you need more
info about one of the discussed points.


----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-05-12 20:23

Message:
Logged In: YES 
user_id=290026

There were only a few modifications necessary to make this
patch perform well under testing:

- It did not compile when XML_DTD was not defined - fixed
- forgot to adjust the appendAttribute() function for
  the modifications in DTD handling logic - fixed

This is the final version of the patch - check the new
file xmlparse_final.c (and the diff created from it).

So, to get the full final patch, download expat.h, 
xmlrole.h, xmlrole.c and xmlparse_final.c from the
set of attached files.

Karl

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-05-07 11:53

Message:
Logged In: YES 
user_id=290026

This is the second improvement and probably the final
version of the patch. If there should be more changes,
a new patch will be entered, since it might get confusing
otherwise.

The following changes have been made:

- some improvements on external PE reference handling
- a better fix (hopefully) for bug # 549014
- the changes to DefaultHandler calls for DTDs
  have been undone - I realized that more extensive
  changes (also to xmlrole.c and xmlrole.h) would be
  necessary, so the old behaviour is back: everything
  for DTDs is reported, even if handlers are set,
  except for PIs, Comments and XML text declarations.

This patch now applies to the following bugs:
- # 553318
- # 551852
- # 549014
- # 548690
- # 544679

Uploaded as files xmlparse_2.c and xmlparse_2.c.diff.

Karl

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-05-04 13:13

Message:
Logged In: YES 
user_id=290026

I improved the patch to cover one more problem:
I found that bug # 551852 (BOM problem with small buffers),
that was reported for general external entities, also
applied to external parameter entities.

This problem only applied to this patch, since the current
release of Expat simply ignores external PEs (which causes
bug # 548690).

Included is also a fix for bug # 549014 (dtdCopy problem).

The attached file are xmlparse_1.c, xmlparse_1.c.annotated
and xmlparse_1.diff (against rev. 1.31).

Karl


----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-05-03 10:09

Message:
Logged In: YES 
user_id=290026

Forgot to add: the fix for bug # 551852 is included too.

Karl

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=310127&aid=551599&group_id=10127