[XML-SIG] SAX, escape problem
Gregor Mosheh
stigmata@blackangel.net
Tue, 2 Apr 2002 20:15:41 -0800 (PST)
It appears that something in SAX is failing to convert quote marks
and some other characters into their entities. How do I correct this?
### This test program...
import xml.sax, xml.sax.writer, xml.sax.handler
import cStringIO
import re
def encode(hash):
buffer = cStringIO.StringIO()
saxout = xml.sax.writer.PrettyPrinter(buffer)
saxout.startDocument()
saxout.startElement("objects",{})
saxout.startElement("object",hash)
saxout.endElement("object")
saxout.endElement("objects")
saxout.endDocument()
return buffer.getvalue()
print encode( { 'keyword' : '< \"this is in quotes\" >' } )+"\n\n"
### ...generates this output, indicating that quote characters are
### not escaped, though <, >, and & characters are escaped. This
### also occurs for half-quote characters and for >128 Unicode
### characters.
<?xml version="1.0" encoding="iso-8859-1"?>
<objects><object keyword="< "this is in quotes" >"/></objects>
### So, I tried inserting some code to do the
### quote-to-" substitutions myself.
### And this test program...
import xml.sax, xml.sax.writer, xml.sax.handler
import cStringIO
import re
def encode(hash):
buffer = cStringIO.StringIO()
saxout = xml.sax.writer.PrettyPrinter(buffer)
saxout.startDocument()
saxout.startElement("objects",{})
for thiskey in hash.keys():
thisval = hash[thiskey]
thisval = re.sub('"','"',thisval)
thisval = re.sub("'",''',thisval)
hash[thiskey] = thisval
saxout.startElement("object",hash)
saxout.endElement("object")
saxout.endElement("objects")
saxout.endDocument()
return buffer.getvalue()
print encode( { 'keyword' : '< \"this is in quotes\" >' } )+"\n\n"
### ...generates the following output, indicating that the escaped
### quote mark entity is re-escaped by something in SAX. This same
### effect occurs, of course, if I do similar subs for >128 Unicode
### characters or for half-quote marks.
<?xml version="1.0" encoding="iso-8859-1"?>
<objects><object keyword="< &quot;this is in
quotes&quot; >"/></objects>