[XML-SIG] Thread safe XML parser

Tom Kirkpatrick tom at settopsolutions.com
Thu Sep 28 10:47:14 CEST 2006


I'm having issues using pyExpat from within a thread... I'm getting  
the following error:

     python: Modules/gcmodule.c:379: move_unreachable: Assertion `gc- 
 >gc.gc_refs > 0' failed.

The code is like so:

     def _handle_success( self ):
         """ called once the fetcher succeeds """
         self.log.debug( "XMLFetcher succeeded fetching %s", self.uri )
         callback = MainThreadCallback( self.signals[ "success" ].emit )
         callback()

     def _handle_error( self ):
         """ called if the fetch attempt fails """
         self.log.debug( "XMLFetcher reached retry limit" )
         callback = MainThreadCallback( self.signals[ "failure" ].emit )
         callback()

     def _do_fetch( self ):
         """ does the work of fetching and processing the xml file  
from the source url """
         reader = PyExpat.Reader()
         for i in range( 0, self.retry_limit ):
             self.try_count += 1
             self.log.debug( "Attempting fetch %s: %s of %s",  
self.uri, self.try_count, self.retry_limit )
             try:
                 self.xml = reader.fromUri( self.uri ).documentElement
                 self._handle_success()
                 return
             except ExpatError, e:
                 self.log.error( "Could not parse XML file" )
             except HTTPError, e:
                 self.log.warning( "HTTP-Error whilst attempting to  
fetch %s: %s" %(self.uri, e.code) )
             except URLError, e:
                 self.log.warning( "ULR-Error whilst attempting to  
fetch %s: %s" %(self.uri, e.reason) )
             time.sleep( self.retry_interval )
         self._handle_error()

     def fetch( self ):
         """ spawns a new thread to fetch the xml file asyncronously """
         thread = Thread( self._do_fetch )
         thread.start()
         return None

------------------------

The offending line is:
     self.xml = reader.fromUri( self.uri ).documentElement

Comment that out and it runs ok (although I get no xml back!!). I  
have also tried a slightly different method - fetching the file with  
urlopen and then using reader.fromStream to do the parsing, but I  
still get the same error:

...
             try:
                 config_file = urllib.urlopen( self.uri )
                 self.xml = reader.fromStream 
( config_file ).documentElement
                 self._handle_success()
                 return
...

If I move the xml parsing stuff out of the thread it runs fine,  
although thats the bit that takes the time and thats the bit that  
need threading the most. I have searched the net trying to find out  
information about python xml parsing and thread safety but am not  
having much luck...

Does anyone know of an xml parsing module with xpath support, that is  
thread safe? Or can anyone suggest a way round this problem or even  
give some pointers as to what the actual problem is being caussed by?


many thanks
Tom

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20060928/143c8234/attachment.htm 


More information about the XML-SIG mailing list