[Expat-discuss] CDATA Handler in Expat
Frank Moore
francis.moore at rawflow.com
Fri Apr 7 16:37:44 CEST 2006
Hi,
With reference to the email I sent earlier, to learn how to set the
CDATA handler myself, I've copied the
recipe 'Using the SAX2 LexicalHandler Interface' from
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/84516
into a .py file, added some output ('***************************') to
mark the places where I expect
the code to pass through, run it and didn't get what I thought I would.
I have a CDATA section in an xml file, run the .py against the file and
it does
not output what I would expect.
The .xml file has (amongst other things) the following content:
<snip>
...
<project>
<script type="text/javascript">
<![CDATA[
<!--
// Stuff ....
//-->
]]>
</script>
</project>
...
</snip>
The output is :
<project>
<script type="text/javascript">
<!--
// Stuff ....
//-->
</script>
</project>
The comment delimiters '<!--' and '-->' are encoded, but where the CDATA
start and end blocks are, there are just empty lines.
I would expect to see some '***************************************' as
per the code.
I would also expect the <![CDATA[ and the ]]> to be output as it says it
does in the code.
I'm using Python 2.4.1, PyXML 0.8.4, and Expat XML Parser 1.95.5.
The recipe however, does mention at the end:
"My tests using Python 2.1, PyXML 0.7 (from CVS) and PIRXX 1.2 indicate
that PIRXX (i.e. Xerces/C) reports all events, xmlproc leaves out the
start/end entity ones, and pyexpat misses those too, in addition to the
start/end DTD events."
When it says "xmlproc leaves out the start/end entity ones, and pyexpat
misses those too"
does this include the startCDATA, endCDATA events? If it does, how am I
supposed to pick
out the contents of my CDATA section with the version of Expat that I'm
currently using?
Many thanks,
Frank.
###########################################################################################
# echoxml.py
import sys
from xml.sax import sax2exts, saxutils, handler
from xml.sax import SAXNotSupportedException, SAXNotRecognizedException
class EchoGenerator(saxutils.XMLGenerator):
def __init__(self, out=None, encoding="iso-8859-1"):
saxutils.XMLGenerator.__init__(self, out, encoding)
self._in_entity = 0
self._in_cdata = 0
def characters(self, content):
if self._in_entity:
return
elif self._in_cdata:
self._out.write('*******************************************')
self._out.write(content)
else:
saxutils.XMLGenerator.characters(self, content)
# -- LexicalHandler interface
def comment(self, content):
self._out.write('<!--%s-->' % content)
def startDTD(self, name, public_id, system_id):
self._out.write('<!DOCTYPE %s' % name)
if public_id:
self._out.write(' PUBLIC %s %s' % (
saxutils.quoteattr(public_id),
saxutils.quoteattr(system_id)))
elif system_id:
self._out.write(' SYSTEM %s' % saxutils.quoteattr(system_id))
def endDTD(self):
self._out.write('>\n')
def startEntity(self, name):
self._out.write('&%s;' % name)
self._in_entity = 1
def endEntity(self, name):
self._in_entity = 0
def startCDATA(self):
self._out.write('*******************************************')
self._out.write('<![CDATA[')
self._in_cdata = 1
def endCDATA(self):
self._out.write(']]>')
self._out.write('*******************************************')
self._in_cdata = 0
def test(xmlfile):
parser = sax2exts.make_parser([
'pirxx',
'xml.sax.drivers2.drv_xmlproc',
'xml.sax.drivers2.drv_pyexpat',
])
print >>sys.stderr, "*** Using", parser
try:
parser.setFeature(handler.feature_namespaces, 1)
except (SAXNotRecognizedException, SAXNotSupportedException):
pass
try:
parser.setFeature(handler.feature_validation, 0)
except (SAXNotRecognizedException, SAXNotSupportedException):
pass
saxhandler = EchoGenerator()
parser.setContentHandler(saxhandler)
parser.setProperty(handler.property_lexical_handler, saxhandler)
parser.parse(xmlfile)
if __name__ == "__main__":
test('build.xml')
More information about the Expat-discuss
mailing list