[XML-SIG] Handling of character entity references

Mark E. snowball3@softhome.net
Mon, 26 May 2003 09:46:14 -0400


<?xml  version="1.0" ?><html>
<head>
<title></title>
</head>
<body>
<div align="left"><font face="Arial" color="#7f0000"><span style="font-size:9pt">&gt; So essentially what I'm asking is how do I get PyXML to preserve</span></font></div>
<div align="left"><font face="Arial" color="#7f0000"><span style="font-size:9pt">&gt; &quot;&amp;eacute;&quot; as-is and output it in the same manner when I PrettyPrint() it?</span></font></div>
<div align="left"><font face="Arial" color="#7f0000"><span style="font-size:9pt">&gt; (Or, equivalently, convert it to its Unicode representation on input and</span></font></div>
<div align="left"><font face="Arial" color="#7f0000"><span style="font-size:9pt">&gt; back to an entity reference on output.)</span></font></div>
<div align="left"><font face="Arial" color="#7f0000"><span style="font-size:9pt">&gt; </span></font></div>
<div align="left"><br/>
</div>
<div align="left"><font face="Arial"><span style="font-size:9pt">It can be done with the expat parser. Below is an example:</span></font></div>
<div align="left"><br/>
</div>
<div align="left"><font face="Arial"><span style="font-size:9pt">from xml.sax.expatreader import ExpatParser</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">from xml.sax.handler import ContentHandler, property_lexical_handler, \</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; 
feature_external_ges, feature_namespaces</span></font></div>
<div align="left"><br/>
</div>
<div align="left"><font face="Arial"><span style="font-size:9pt">class MyExpatParser(ExpatParser):</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">&#160;&#160;&#160; def reset(self):</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">&#160;&#160;&#160;&#160;&#160;&#160;&#160; ExpatParser.reset(self)</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">&#160;&#160;&#160;&#160;&#160;&#160;&#160; parser = self._parser</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">&#160;&#160;&#160;&#160;&#160;&#160;&#160; parser.UseForeignDTD(True)</span></font></div>
<div align="left"><br/>
</div>
<div align="left"><br/>
</div>
<div align="left"><font face="Arial"><span style="font-size:9pt">class MyContentHandler(ContentHandler):</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">&#160;&#160;&#160; &quot;&quot;&quot;Build a tree from the elements, comments, and processing instructions</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">&#160;&#160;&#160;&#160;&#160;&#160; in an XML formatted document.</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">&#160;&#160;&#160; &quot;&quot;&quot;</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">&#160;&#160;&#160; def __init__(self):</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">&#160;&#160;&#160;&#160;&#160;&#160;&#160; pass</span></font></div>
<div align="left"><br/>
</div>
<div align="left"><font face="Arial"><span style="font-size:9pt">&#160;&#160;&#160; def skippedEntity(self, name):</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">&#160;&#160;&#160;&#160;&#160;&#160;&#160; &quot;&quot;&quot;Found an undefined entity. Preserve it here.&quot;&quot;&quot;</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">&#160;&#160;&#160;&#160;&#160;&#160;&#160; print &quot;Skipping entity %s&quot; % name</span></font></div>
<div align="left"><br/>
</div>
<div align="left"><br/>
</div>
<div align="left"><font face="Arial"><span style="font-size:9pt">def create_parser():</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">&#160;&#160;&#160; parser = MyExpatParser()</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">&#160;&#160;&#160; ch = MyContentHandler()</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">&#160;&#160;&#160; parser.setContentHandler(ch)</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">&#160;&#160;&#160; # Comment next line to cause a traceback with the test code below.</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">&#160;&#160;&#160; parser.setFeature(feature_external_ges, False)</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">&#160;&#160;&#160; return parser</span></font></div>
<div align="left"><br/>
</div>
<div align="left"><br/>
</div>
<div align="left"><font face="Arial"><span style="font-size:9pt">def fromString(data):</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">&#160;&#160;&#160; import cStringIO</span></font></div>
<div align="left"><br/>
</div>
<div align="left"><font face="Arial"><span style="font-size:9pt">&#160;&#160;&#160; parser = create_parser()</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">&#160;&#160;&#160; f = cStringIO.StringIO(data)</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">&#160;&#160;&#160; parser.parse(f)</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">&#160;&#160;&#160; f.close()</span></font></div>
<div align="left"><br/>
</div>
<div align="left"><br/>
</div>
<div align="left"><font face="Arial"><span style="font-size:9pt">if __name__ == &quot;__main__&quot;:</span></font></div>
<div align="left"><font face="Arial"><span style="font-size:9pt">&#160;&#160;&#160; fromString(&quot;&lt;html&gt;&lt;body&gt;some text &amp;eacute; more text &amp;foo;&lt;/body&gt;&lt;/html&gt;&quot;)</span></font></div>
<div align="left"></div>
</body>
</html>