[XML-SIG] Bug with XML file having a doctype declaration
Gottfried.Ganssauge@HAUFE.DE
Gottfried.Ganssauge@HAUFE.DE
Wed, 26 Mar 2003 10:30:27 +0100
Consider the following test program (You probably recognize it from a
recently fixed bug ...)
--------------><---------------------><--------------------------
#! /usr/bin/env python =20
import xml.dom
from xml.dom.ext.reader import Sax2
reader =3D Sax2.Reader()
doc =3D reader.fromString("""<?xml version=3D"1.0" ?>
<!DOCTYPE kasten PUBLIC "-//Jochen Voss//DTD Zettel 1.0//EN" =
"zettel.dtd">
<kasten>
</kasten>
""")
for c in doc.childNodes:
if c.nodeType=3D=3Dxml.dom.Node.DOCUMENT_TYPE_NODE:
print "public ID: "+c.publicId
print "system ID: "+c.systemId
--------------><---------------------><--------------------------
When I let this run with PyXML-0.8.2 I get a stack trace:
Traceback (most recent call last):
File "./saxtest.py", line 21, in ?
doc=3Dreader.fromString("""<?xml version=3D"1.0" ?>
File
"/usr/lib/python2.2/site-packages/_xmlplus/dom/ext/reader/__init__.py", =
line
61, in fromString
return self.fromStream(stream, ownerDoc)
File =
"/usr/lib/python2.2/site-packages/_xmlplus/dom/ext/reader/Sax2.py",
line 373, in fromStream
self.parser.parse(s)
File "/usr/lib/python2.2/site-packages/_xmlplus/sax/expatreader.py", =
line
107, in parse
xmlreader.IncrementalParser.parse(self, source)
File "/usr/lib/python2.2/site-packages/_xmlplus/sax/xmlreader.py", =
line
123, in parse
self.feed(buffer)
File "/usr/lib/python2.2/site-packages/_xmlplus/sax/expatreader.py", =
line
207, in feed
self._parser.Parse(data, isFinal)
File "/usr/lib/python2.2/site-packages/_xmlplus/sax/expatreader.py", =
line
379, in external_entity_ref
self._source.getSystemId() or
File "/usr/lib/python2.2/site-packages/_xmlplus/sax/saxutils.py", =
line
515, in prepare_input_source
f =3D urllib2.urlopen(source.getSystemId())
File "/usr/lib/python2.2/urllib2.py", line 138, in urlopen
return _opener.open(url, data)
File "/usr/lib/python2.2/urllib2.py", line 320, in open
type_ =3D req.get_type()
File "/usr/lib/python2.2/urllib2.py", line 224, in get_type
raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: zettel.dtd
>From this i concluded that the parser wanted to parse the external =
subset of
my doctype declaration. As I don't need any DTD details within my
application I'd like to be able to simply keep that doctypedecl-node =
without
parsing it further.
After digging around in the sources for expatreader I came up with the
following workaround:
--------------><---------------------><--------------------------
#! /usr/bin/env python =20
from xml.sax.expatreader import \
ExpatParser, \
expat
class pyExpatWrapper(ExpatParser):
"""
Wrapper f=FCr den ExpatParser, der verhindert, dass versucht wird =
das
externe Subset der DOCTYPE-Spezifikation zu parsen.
"""
def reset(self):
ExpatParser.reset(self)
self._parser.SetParamEntityParsing (
expat.XML_PARAM_ENTITY_PARSING_NEVER)
import xml.dom
from xml.dom.ext.reader import Sax2
reader =3D Sax2.Reader()
doc =3D reader.fromString("""<?xml version=3D"1.0" ?>
<!DOCTYPE kasten PUBLIC "-//Jochen Voss//DTD Zettel 1.0//EN" =
"zettel.dtd">
<kasten>
</kasten>
""")
for c in doc.childNodes:
if c.nodeType=3D=3Dxml.dom.Node.DOCUMENT_TYPE_NODE:
print "public ID: "+c.publicId
print "system ID: "+c.systemId
--------------><---------------------><--------------------------
That way everything works
[at least with PyXML-0.8.2, with PyXML-0.7.1 I get=20
public ID:
system ID:
]
is that the way it's meant to be done? Or is there an easier, less =
parser
dependant way to achieve my goal?
Cheers,
=20
Gottfried