[XML-SIG] Handling of character entity references
Mark E.
snowball3@softhome.net
Mon, 26 May 2003 09:46:14 -0400
<?xml version="1.0" ?><html>
<head>
<title></title>
</head>
<body>
<div align="left"><font face="Arial" color="#7f0000"><span style="font-size:9pt">> So essentially what I'm asking is how do I get PyXML to preserve</span></font></div>
<div align="left"><font face="Arial" color="#7f0000"><span style="font-size:9pt">> "&eacute;" as-is and output it in the same manner when I PrettyPrint() it?</span></font></div>
<div align="left"><font face="Arial" color="#7f0000"><span style="font-size:9pt">> (Or, equivalently, convert it to its Unicode representation on input and</span></font></div>
<div align="left"><font face="Arial" color="#7f0000"><span style="font-size:9pt">> back to an entity reference on output.)</span></font></div>
<div align="left"><font face="Arial" color="#7f0000"><span style="font-size:9pt">> </span></font></div>
<div align="left"><br/>
</div>
<div align="left"><font face="Arial"><span style="font-size:9pt">It can be done with the expat parser. Below is an example:</span></font></div>
<div align="left"><br/>
</div>
<div align="left"><font face="Arial"><span style="font-size:9pt">from xml.sax.expatreader import ExpatParser</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">from xml.sax.handler import ContentHandler, property_lexical_handler, \</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">                           
feature_external_ges, feature_namespaces</span></font></div>
<div align="left"><br/>
</div>
<div align="left"><font face="Arial"><span style="font-size:9pt">class MyExpatParser(ExpatParser):</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">    def reset(self):</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">        ExpatParser.reset(self)</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">        parser = self._parser</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">        parser.UseForeignDTD(True)</span></font></div>
<div align="left"><br/>
</div>
<div align="left"><br/>
</div>
<div align="left"><font face="Arial"><span style="font-size:9pt">class MyContentHandler(ContentHandler):</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">    """Build a tree from the elements, comments, and processing instructions</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">       in an XML formatted document.</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">    """</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">    def __init__(self):</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">        pass</span></font></div>
<div align="left"><br/>
</div>
<div align="left"><font face="Arial"><span style="font-size:9pt">    def skippedEntity(self, name):</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">        """Found an undefined entity. Preserve it here."""</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">        print "Skipping entity %s" % name</span></font></div>
<div align="left"><br/>
</div>
<div align="left"><br/>
</div>
<div align="left"><font face="Arial"><span style="font-size:9pt">def create_parser():</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">    parser = MyExpatParser()</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">    ch = MyContentHandler()</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">    parser.setContentHandler(ch)</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">    # Comment next line to cause a traceback with the test code below.</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">    parser.setFeature(feature_external_ges, False)</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">    return parser</span></font></div>
<div align="left"><br/>
</div>
<div align="left"><br/>
</div>
<div align="left"><font face="Arial"><span style="font-size:9pt">def fromString(data):</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">    import cStringIO</span></font></div>
<div align="left"><br/>
</div>
<div align="left"><font face="Arial"><span style="font-size:9pt">    parser = create_parser()</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">    f = cStringIO.StringIO(data)</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">    parser.parse(f)</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">    f.close()</span></font></div>
<div align="left"><br/>
</div>
<div align="left"><br/>
</div>
<div align="left"><font face="Arial"><span style="font-size:9pt">if __name__ == "__main__":</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">    fromString("<html><body>some text &eacute; more text &foo;</body></html>")</span></font></div>
<div align="left"></div>
</body>
</html>