[XML-SIG] Parser not preserving DTD?
Matthew Shomphe
Matthews@heyanita.com
Wed, 4 Sep 2002 16:42:29 -0700
I've done a few tests to see where the issue in getting mangled DTDs is =
coming from. I can't report much success beyond the following:
1. The problem is not with pyexpat or Expat. I was able to run some =
tests and the full DTD is passed to pyexpat.
I added the following code to test_pyexpat.py:
def StartDoctypeDeclHandler(self, *args):
doctypeName, systemId, publicId, has_internal_subset =3D args
print 'DTD declared:', args
=20
The full DTD was printed to stdout
2. The SAX implementation does not natively support <!DOCTYPE> =
declarations. From their website =
(http://www.saxproject.org/?selected=3Dfaq):
----
Does SAX support comments/CDATA sections/DOCTYPE declarations, etc.?=20
Not in the core API. These kinds of things are pure lexical details, =
and are not relevant to most kinds of XML processing, so it doesn't make =
sense to put them in the core and force all implementors to support =
them.
However, SAX2 is designed to be extensible, and the LexicalHandler =
interface is supported by most SAX parsers. SAX2 parsers are not =
required to support this handler, but they are required to report an =
error if you try to use handlers they don't support.=20
----
<!NOTATION> & unparsed entites are supported.
3. The above-mentioned LexicalHandler does seem to support DTDs, but I =
have no idea how to implement this.
In short, there is some place along the processing route where data are =
being lost. I'm not well-versed in the APIs for this set of =
applications, so I'm a bit dazed trying to track down the methods and =
attributes needed to get the DTD passed all the way through. It seems =
to be an issue with SAX2, which has an extension, but it's just not been =
implemented yet.=20
Is there any other type of reader out there that will not truncate DTDs =
& returns a full DOM?
Thanks,
Matt