[XML-SIG] XML 0.5.1 bug: 'amp' character reference not handled correctly by "HtmlBuilder/HtmlWriter"

Fred L. Drake, Jr. Fred L. Drake, Jr." <fdrake@acm.org
Fri, 13 Aug 1999 09:59:28 -0400 (EDT)


--Apu33M+PUU
Content-Type: text/plain; charset=us-ascii
Content-Description: message body text
Content-Transfer-Encoding: 7bit


Dieter Maurer writes:
 > "HtmlBuilder" translates '&amp;' into an entity reference.
 > This does not follow the DOM spec. It specifies that
 > character references are expected to be expanded by the
 > HTML/XML processor.
 > 
 > "XmlWriter/HtmlWriter" does not output the 'amp' entity reference.
 > This, obviously, is a bug in "XmlWriter/HtmlWriter".

  No, but if & is present as data, it writes out &amp;, so I think
that's OK.

 > By the way, processing instructions are not output, too.

  You you sure they're in your tree?  What I see is that they are
output, but using the XML-style syntax: <?foo bar?> instead of
<?foo bar>.
  I've checked in a fix that allows HtmlWriter to produce SGML-style
PIs.  This *doesn't* do anything to change the handling of PIs as
(target, value) tuples; this was a concept introduced in some of the
XML APIs (not even XML itself as I understand it).
  The patch to xml/dom/writer.py is attached; it also teaches the
*Lineariser classes to use cStringIO when available.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives


--Apu33M+PUU
Content-Type: text/plain
Content-Description: xml/dom/writer.py patch
Content-Disposition: inline;
	filename="PATCH"
Content-Transfer-Encoding: 7bit

Index: writer.py
===================================================================
RCS file: /home/cvsroot/xml/dom/writer.py,v
retrieving revision 1.9
retrieving revision 1.10
diff -c -r1.9 -r1.10
*** writer.py	1999/04/28 02:42:19	1.9
--- writer.py	1999/08/13 13:50:18	1.10
***************
*** 124,131 ****
  class XmlLineariser(XmlWriter):
  
      def __init__(self):
!         import StringIO
!         self.buffer = StringIO.StringIO()
          XmlWriter.__init__(self, self.buffer)
  
      def linearise(self, node):
--- 124,134 ----
  class XmlLineariser(XmlWriter):
  
      def __init__(self):
!         try:
!             from cStringIO import StringIO
!         except ImportError:
!             from StringIO import StringIO
!         self.buffer = StringIO()
          XmlWriter.__init__(self, self.buffer)
  
      def linearise(self, node):
***************
*** 169,180 ****
          
          self._setNewLines(nl_dict)
  
  
  class HtmlLineariser(HtmlWriter):
  
      def __init__(self):
!         import StringIO
!         self.buffer = StringIO.StringIO()
          HtmlWriter.__init__(self, self.buffer)
  
      def linearise(self, node):
--- 172,192 ----
          
          self._setNewLines(nl_dict)
  
+     def doOtherNode(self, node):
+         if node.get_nodeType() == PROCESSING_INSTRUCTION_NODE:
+             self.stream.write("<?%s %s>" % (node.target, node.value))
+         else:
+             XmlWriter.doOtherNode(self, node)
  
+ 
  class HtmlLineariser(HtmlWriter):
  
      def __init__(self):
!         try:
!             from cStringIO import StringIO
!         except ImportError:
!             from StringIO import StringIO
!         self.buffer = StringIO()
          HtmlWriter.__init__(self, self.buffer)
  
      def linearise(self, node):

--Apu33M+PUU--