[Expat-discuss] CDATA Handler in Expat

Frank Moore francis.moore at rawflow.com
Fri Apr 7 16:37:44 CEST 2006


Hi,

With reference to the email I sent earlier, to learn how to set the 
CDATA handler myself, I've copied the
recipe 'Using the SAX2 LexicalHandler Interface' from 
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/84516
into a .py file, added some output ('***************************') to 
mark the places where I expect
the code to pass through, run it and didn't get what I thought I would.

I have a CDATA section in an xml file, run the .py against the file and 
it does
not output what I would expect.

The .xml file has (amongst other things) the following content:

<snip>
...
<project>
   <script type="text/javascript">
   <![CDATA[
   <!--
    // Stuff ....

    //-->
    ]]>
   </script>
</project>   
...
</snip>

The output is :


<project>
   <script type="text/javascript">

   &lt;!--
    // Stuff ....

    //--&gt;

   </script>
</project>

The comment delimiters '<!--' and '-->' are encoded, but where the CDATA 
start and end blocks are, there are just empty lines.
I would expect to see some '***************************************' as 
per the code.
I would also expect the <![CDATA[ and the ]]> to be output as it says it 
does in the code.

I'm using Python 2.4.1, PyXML 0.8.4, and Expat XML Parser 1.95.5.

The recipe however, does mention at the end:

"My tests using Python 2.1, PyXML 0.7 (from CVS) and PIRXX 1.2 indicate
that PIRXX (i.e. Xerces/C) reports all events, xmlproc leaves out the
start/end entity ones, and pyexpat misses those too, in addition to the
start/end DTD events."

When it says "xmlproc leaves out the start/end entity ones, and pyexpat 
misses those too"
does this include the startCDATA, endCDATA events? If it does, how am I 
supposed to pick
out the contents of my CDATA section with the version of Expat that I'm 
currently using?

Many thanks,
Frank.

###########################################################################################
# echoxml.py

import sys
from xml.sax import sax2exts, saxutils, handler
from xml.sax import SAXNotSupportedException, SAXNotRecognizedException

class EchoGenerator(saxutils.XMLGenerator):

    def __init__(self, out=None, encoding="iso-8859-1"):
        saxutils.XMLGenerator.__init__(self, out, encoding)
        self._in_entity = 0
        self._in_cdata = 0

    def characters(self, content):
        if self._in_entity:
            return
        elif self._in_cdata:
            self._out.write('*******************************************')
            self._out.write(content)
        else:
            saxutils.XMLGenerator.characters(self, content)

    # -- LexicalHandler interface

    def comment(self, content):
        self._out.write('<!--%s-->' % content)

    def startDTD(self, name, public_id, system_id):
        self._out.write('<!DOCTYPE %s' % name)
        if public_id:
            self._out.write(' PUBLIC %s %s' % (
                saxutils.quoteattr(public_id),
                saxutils.quoteattr(system_id)))
        elif system_id:
            self._out.write(' SYSTEM %s' % saxutils.quoteattr(system_id))

    def endDTD(self):
        self._out.write('>\n')

    def startEntity(self, name):
        self._out.write('&%s;' % name)
        self._in_entity = 1

    def endEntity(self, name):
        self._in_entity = 0

    def startCDATA(self):
        self._out.write('*******************************************')
        self._out.write('<![CDATA[')
        self._in_cdata = 1

    def endCDATA(self):
        self._out.write(']]>')
        self._out.write('*******************************************')
        self._in_cdata = 0


def test(xmlfile):
    parser = sax2exts.make_parser([
        'pirxx',
        'xml.sax.drivers2.drv_xmlproc',
        'xml.sax.drivers2.drv_pyexpat',
    ])
    print >>sys.stderr, "*** Using", parser

    try:
        parser.setFeature(handler.feature_namespaces, 1)
    except (SAXNotRecognizedException, SAXNotSupportedException):
        pass
    try:
        parser.setFeature(handler.feature_validation, 0)
    except (SAXNotRecognizedException, SAXNotSupportedException):
        pass

    saxhandler = EchoGenerator()
    parser.setContentHandler(saxhandler)
    parser.setProperty(handler.property_lexical_handler, saxhandler)
    parser.parse(xmlfile)


if __name__ == "__main__":
    test('build.xml')


More information about the Expat-discuss mailing list