data:image/s3,"s3://crabby-images/5e181/5e181be423bbdc478f08a6aa349d857671e9854d" alt=""
I’m trying to extend PyErrorLog, and since I’m using XMLParser( recover=True ), I’m trying to change all of the reported Levels to WARNING and log XML Syntax errors via etree.use_global_python_log(XMLErrorLog(logger=logging.getLogger(__name__).getChild('XMLParser'))) When I try to add level_map as an instance variable in my classes __init__() method, I get an error message saying that it’s not a writable attribute. When I add a level_map as a class variable, it doesn’t complain, but it doesn’t appear to use it in the mapping. With this mapping, everything is still tagged as CRITICAL or ERROR. class XMLErrorLog(etree.PyErrorLog): level_map = { etree.ErrorLevels.WARNING : logging.WARNING, etree.ErrorLevels.ERROR : logging.WARNING, etree.ErrorLevels.FATAL : logging.WARNING } I have then tried to modify level of the _LogEntry passed to receive before calling log method, but that also does not appear to be possible. I’ve finally managed to get something to work by using: class LogEntry(object): level = 1 And in my XMLErrorLog class: def receive(self, log_entry ): logrepr = "[ %s:%d:%d:%s:%s:%s: %s ]" % ( '', log_entry.line, log_entry.column, "Warning", log_entry.domain_name, log_entry.type_name, log_entry.message) self.log( LogEntry(), logrepr ) But it seemed from the documentation that providing the level_map in my class should have been enough. Am I missing something, or is the documentation incorrect ? — Steve Majewski
data:image/s3,"s3://crabby-images/8bbe6/8bbe681f08550d13b35a459376ee85cf203c1262" alt=""
Hi,
The PyError logs level_map attribute is indeed read-only: # https://github.com/lxml/lxml/blob/582b598fd7aa49fecd64fea2ad88e969832f2beb/s... cdef class PyErrorLog(_BaseErrorLog): # ... cdef readonly dict level_map But you can update the level map dict in a subclass as it's a mutable:
Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart HRA 4356, HRA 104 440 Amtsgericht Mannheim HRA 40687 Amtsgericht Mainz Die LBBW verarbeitet gemaess Erfordernissen der DSGVO Ihre personenbezogenen Daten. Informationen finden Sie unter https://www.lbbw.de/datenschutz.
data:image/s3,"s3://crabby-images/5e181/5e181be423bbdc478f08a6aa349d857671e9854d" alt=""
Thanks: I didn’t think of trying update. What I have now is working great: class XMLErrorLog( etree.PyErrorLog ): new_map = { etree.ErrorLevels.WARNING : logging.WARNING, etree.ErrorLevels.ERROR : logging.WARNING, etree.ErrorLevels.FATAL : logging.WARNING, } def __init__( self, *args, **kwargs ): etree.PyErrorLog.__init__( self, *args, **kwargs ) self.level_map.update( self.new_map ) def receive(self, log_entry ): logrepr = "%s:%d:%d:%s%s.%s:[%s]" % ( log_entry.filename, log_entry.line, log_entry.column, "", log_entry.domain_name, log_entry.type_name, log_entry.message) self.log( log_entry, logrepr ) etree.use_global_python_log(XMLErrorLog(logger=logging.getLogger(__name__).getChild('XMLParser'))) WARNING <string>:2:511:PARSER.ERR_NAME_REQUIRED:[xmlParseEntityRef: no name] DEBUG Writing to file /usr/local/projects/Archivespace/OAI/tmp/oai:jmu%2F%2Frepositories%2F4%2Fresources%2F569.oai_ead.xml WARNING Recoverable XMLParser error on: oai:jmu//repositories/4/resources/569 With that last line produced by checking parser error_log not empty. That check is deferred until after parse so that I can extract the identifier from the header: if client.XMLParser.error_log : logging.getLogger(__name__).getChild('XMLParser').warning( 'Recoverable XMLParser error on: %s', header.identifier() ) [ Trying to harvest an OAI feed, where some of the metadata payloads are bad XML. Mostly unescaped ampersands. I don’t want one bad file to halt harvesting, but I still want to log and track errors so I can notify feed maintainers upstream. ] — Steve Majewski
data:image/s3,"s3://crabby-images/8bbe6/8bbe681f08550d13b35a459376ee85cf203c1262" alt=""
Hi,
The PyError logs level_map attribute is indeed read-only: # https://github.com/lxml/lxml/blob/582b598fd7aa49fecd64fea2ad88e969832f2beb/s... cdef class PyErrorLog(_BaseErrorLog): # ... cdef readonly dict level_map But you can update the level map dict in a subclass as it's a mutable:
Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart HRA 4356, HRA 104 440 Amtsgericht Mannheim HRA 40687 Amtsgericht Mainz Die LBBW verarbeitet gemaess Erfordernissen der DSGVO Ihre personenbezogenen Daten. Informationen finden Sie unter https://www.lbbw.de/datenschutz.
data:image/s3,"s3://crabby-images/5e181/5e181be423bbdc478f08a6aa349d857671e9854d" alt=""
Thanks: I didn’t think of trying update. What I have now is working great: class XMLErrorLog( etree.PyErrorLog ): new_map = { etree.ErrorLevels.WARNING : logging.WARNING, etree.ErrorLevels.ERROR : logging.WARNING, etree.ErrorLevels.FATAL : logging.WARNING, } def __init__( self, *args, **kwargs ): etree.PyErrorLog.__init__( self, *args, **kwargs ) self.level_map.update( self.new_map ) def receive(self, log_entry ): logrepr = "%s:%d:%d:%s%s.%s:[%s]" % ( log_entry.filename, log_entry.line, log_entry.column, "", log_entry.domain_name, log_entry.type_name, log_entry.message) self.log( log_entry, logrepr ) etree.use_global_python_log(XMLErrorLog(logger=logging.getLogger(__name__).getChild('XMLParser'))) WARNING <string>:2:511:PARSER.ERR_NAME_REQUIRED:[xmlParseEntityRef: no name] DEBUG Writing to file /usr/local/projects/Archivespace/OAI/tmp/oai:jmu%2F%2Frepositories%2F4%2Fresources%2F569.oai_ead.xml WARNING Recoverable XMLParser error on: oai:jmu//repositories/4/resources/569 With that last line produced by checking parser error_log not empty. That check is deferred until after parse so that I can extract the identifier from the header: if client.XMLParser.error_log : logging.getLogger(__name__).getChild('XMLParser').warning( 'Recoverable XMLParser error on: %s', header.identifier() ) [ Trying to harvest an OAI feed, where some of the metadata payloads are bad XML. Mostly unescaped ampersands. I don’t want one bad file to halt harvesting, but I still want to log and track errors so I can notify feed maintainers upstream. ] — Steve Majewski
participants (2)
-
Holger Joukl
-
Majewski, Steven Dennis (sdm7g)