[XML-SIG] 0.6.4: another problem with building DOM using validating parser

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Sun, 4 Mar 2001 23:26:42 +0100


> from xml.dom.ext.reader.Sax2         import FromXmlFile
> 
> f = open ('test5.xml', 'w')
> f.write ("""<?xml version="1.0"?>
> <!DOCTYPE configuration  [
>   <!ENTITY testscrap   SYSTEM "testscrap">
>   <!ELEMENT configuration EMPTY>
> ]>
> 
> <configuration/>
> """)
> f.close()
> 
> doc = FromXmlFile ('test5.xml', None, 1)
> 
> print doc
[...]
> !     def unparsedEntityDecl (self, publicId, systemId, notationName):
> !         new_notation = self._ownerDoc.getFactory().createEntity(self._ownerDoc,  publicId, systemId, notationName)
>           self._ownerDoc.getDocumentType().getEntities().setNamedItem(new_notation)
>           return

I'm glad that others are as confused about the matter as I am. What
you have in your document is not an unparsed entity, but an external
one - the unparsed ones have an NDATA notation name. xmlproc detected
that properly (by setting ndata to ""), but drv_xmlproc expected None
as the ndata. So I changed to to invoke externalEntityDecl in that
case, which is not handled by Sax2.

As you found, *if* this was ever invoked, _ownerDoc will be None
(since the document element has not been seen yet). Instead of
ignoring the unparsed entity, it would be better to put them into the
_orphanedChildren; I've changed it thus. In the process, I found that
things are put into _orphanedChildren which are later not processed -
I've fixed that too.

I still think that the unparsedEntityDecl callback is completely
broken. What is getFactory and getEntities? Also, if there is a
feature for creating entities, it is surely part of a 4DOM extension -
probably on the document type. However, that apparently is not capable
of distinguishing between external and unparsed entities; not sure
whether it should.

In any case, I've applied the following patch. I'd appreciate if
somebody of FourThough could take a look.

Regards,
Martin

Index: xml/dom/ext/reader/Sax2.py
===================================================================
RCS file: /cvsroot/pyxml/xml/xml/dom/ext/reader/Sax2.py,v
retrieving revision 1.7
diff -u -r1.7 Sax2.py
--- xml/dom/ext/reader/Sax2.py	2001/02/20 01:00:03	1.7
+++ xml/dom/ext/reader/Sax2.py	2001/03/04 22:05:59
@@ -8,7 +8,7 @@
 Components for reading XML files from a SAX2 producer.
 WWW: http://4suite.com/4DOM         e-mail: support@4suite.com
 
-Copyright (c) 2000 Fourthought Inc, USA.   All Rights Reserved.
+Copyright (c) 2000, 2001 Fourthought Inc, USA.   All Rights Reserved.
 See  http://4suite.com/COPYRIGHT  for license and copyright information
 """
 
@@ -148,6 +148,10 @@
                     self._ownerDoc.appendChild(comment)
             elif o_node[0] == 'doctype':
                 before_doctype = 0
+            elif o_node[0] == 'unparsedentitydecl':
+                apply(self.unparsedEntityDecl, o_node[1:])
+            else:
+                raise "Unknown orphaned node:"+o_node[0]
         self._rootNode = self._ownerDoc
         self._nodeStack.append(self._rootNode)
         return
@@ -222,7 +226,7 @@
     def startDTD(self, doctype, publicID, systemID):
         if not self._rootNode:
             self._dt = implementation.createDocumentType(doctype, publicID, systemID)
-            self._orphanedNodes.append(('doctype'))
+            self._orphanedNodes.append(('doctype',))
         else:
             raise 'Illegal DocType declaration'
         return
@@ -255,9 +259,12 @@
         self._ownerDoc.getDocumentType().getNotations().setNamedItem(new_notation)
         return
 
-    def unparsedEntityDecl (self, publicId, systemId, notationName):
-        new_notation = self._ownerDoc.getFactory().createEntity(self._ownerDoc,  publicId, systemId, notationName)
-        self._ownerDoc.getDocumentType().getEntities().setNamedItem(new_notation)
+    def unparsedEntityDecl (self, name, publicId, systemId, ndata):
+        if self._ownerDoc:
+            new_notation = self._ownerDoc.getFactory().createEntity(self._ownerDoc,  publicId, systemId, name)
+            self._ownerDoc.getDocumentType().getEntities().setNamedItem(new_notation)
+        else:
+            self._orphanedNodes.append(('unparsedentitydecl', name, publicId, systemId, ndata))
         return
 
     #Overridden ErrorHandler methods