[XML-SIG] 0.6.4: problems with sax exceptions

scott snyder scott snyder <snyder@fnal.gov>
Fri, 02 Mar 2001 20:01:02 CST


hi -

I've been having some problems with sax exceptions in 0.6.4,
while trying to build DOM trees from XML.

Consider this program.  It creates an invalid xml file and reads it.
The resulting exception is caught and printed.

---------------------------------------------------------------------
from xml.dom.ext.reader.Sax2         import FromXmlFile
from xml.sax                         import saxlib

f = open ('test3.xml', 'w')
f.write ("""<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "NONEXISTENT.dtd">
<""")
f.close()

try:
    doc = FromXmlFile ('test3.xml')
except saxlib.SAXException, e:
    print e
---------------------------------------------------------------------

However, when i run this, i get


[sss@karma xmltest]$ python read3.py

Traceback (innermost last):
  File "read3.py", line 13, in ?
    print e
  File "xml/sax/_exceptions.py", line 83, in __str__
    sysid = self.getSystemId()
  File "xml/sax/_exceptions.py", line 79, in getSystemId
    return self._locator.getSystemId()
  File "xml/sax/drivers2/drv_xmlproc.py", line 161, in getSystemId
    return self._parser.get_current_sysid() # FIXME?
AttributeError: 'None' object has no attribute 'get_current_sysid'


It looks like the objects that get followed to get this information
get deleted during the stack unwind.

Here's an attempt at a fix:


*** xml/sax/_exceptions.py-orig	Fri Mar  2 19:43:46 2001
--- xml/sax/_exceptions.py	Fri Mar  2 19:43:59 2001
***************
*** 61,74 ****
          SAXException.__init__(self, msg, exception)
          self._locator = locator
  
      def getColumnNumber(self):
          """The column number of the end of the text where the exception
          occurred."""
!         return self._locator.getColumnNumber()
  
      def getLineNumber(self):
          "The line number of the end of the text where the exception occurred."
!         return self._locator.getLineNumber()
  
      def getPublicId(self):
          "Get the public identifier of the entity where the exception occurred."
--- 61,82 ----
          SAXException.__init__(self, msg, exception)
          self._locator = locator
  
+         # We need to cache this stuff at construction time.
+         # If this exception is thrown, the objects through which we must
+         # traverse to get this information may be deleted by the time
+         # it gets caught.
+         self._systemId = self._locator.getSystemId()
+         self._colnum = self._locator.getColumnNumber()
+         self._linenum = self._locator.getLineNumber()
+ 
      def getColumnNumber(self):
          """The column number of the end of the text where the exception
          occurred."""
!         return self._colnum
  
      def getLineNumber(self):
          "The line number of the end of the text where the exception occurred."
!         return self._linenum
  
      def getPublicId(self):
          "Get the public identifier of the entity where the exception occurred."
***************
*** 76,82 ****
  
      def getSystemId(self):
          "Get the system identifier of the entity where the exception occurred."
!         return self._locator.getSystemId()
  
      def __str__(self):
          "Create a string representation of the exception."
--- 84,90 ----
  
      def getSystemId(self):
          "Get the system identifier of the entity where the exception occurred."
!         return self._systemId
  
      def __str__(self):
          "Create a string representation of the exception."


With this change, the program prints this:

$ python read3.py
test3.xml:3:1: Premature document end, no root element


However, if i switch to using a validating XML parser, then i lose
the file name in the exception (this assumes the patches in my last
note to make the validating parser actually work are applied).


---------------------------------------------------------------------
from xml.dom.ext.reader.Sax2         import FromXmlFile
from xml.sax                         import saxlib

f = open ('test3.xml', 'w')
f.write ("""<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "test4.dtd">
<""")
f.close()

f = open ('test4.dtd', 'w')
f.write ("<!ELEMENT configuration EMPTY>\n")
f.close ()

try:
    doc = FromXmlFile ('test3.xml', None, 1)
except saxlib.SAXException, e:
    print e
---------------------------------------------------------------------


$ python read4.py
Unknown:3:1: Premature document end, no root element


The following patch seems to fix the problem.


*** xml/parsers/xmlproc/xmlval.py-orig2	Fri Mar  2 19:55:03 2001
--- xml/parsers/xmlproc/xmlval.py	Fri Mar  2 19:55:33 2001
***************
*** 26,31 ****
--- 26,32 ----
          self.app=Application()
          self.dtd=CompleteDTD(self.parser)
          self.val=ValidatingApp(self.dtd,self.parser)
+         self.current_sysID = "Unknown"
          self.reset()
  
      def parse_resource(self,sysid):
***************
*** 99,104 ****
--- 100,106 ----
          self.parser.parseEnd()
  
      def read_from(self,file,bufsize=16384):
+         self.parser.current_sysID = self.current_sysID
          self.parser.read_from(file,bufsize)
  
      def flush(self):


Now, when i run the program, i get

$ python read4.py
test3.xml:3:1: Premature document end, no root element