Martijn Faassen wrote:
Another question: How does error logging work in combination with threads? I noticed that the code in lxml that turned off the talkativeness of libxml2 actually only worked for the main thread, and that new threads that use lxml do become talkative again.
According to the libxml2 docs, that's intentional. Each thread has to configure that for itself. Currently, there isn't that much in lxml anyway that takes care of threads. Everything that's module level will interfere. A way to get around this would be to set an error log in each sensible function. Hmm, I actually think that would be the right way. I'll code this up and see how it turns out.
libxml2 gives you this:
int domain : What part of the library raised this er int code : The error code, e.g. an xmlParserError char * message : human-readable informative error messag xmlErrorLevel level : how consequent is the error char * file : the filename int line : the line number if available char * str1 : extra string information char * str2 : extra string information char * str3 : extra string information int int1 : extra number information int int2 : column number of the error or 0 if N/A void * ctxt : the parser context if available void * node : the node in the tree
The problem is: the more information you put into the log, the slower the application becomes. Providing the element that triggered the error, for example, would rather be out of scope. Note that you have to convert this information to Python representations in order to store it in the log.
I'm not too concerned that slowing down exceptions somewhat is going to impact things that badly - these exceptions are typically not occuring very often. Since it's lxml's mission to make libxml2 usable by mortal python programmers with a nice API, I consider it part of our mission to make the error API as nice as possible too, providing as much information as we can, in an easy to understand way.
That's all future music though. I think this is already a great step forward, I'm just pointing where I'd like to go.
I also thought a bit more about this. It would be better to store more information and then allow filtering based on domain and error codes. RNG classes should only return RNG errors, for example (although earlier failures may have contributed to the current error...). Maybe use a dedicated log entry class rather than plain strings?
We also have the case for RelaxNG/Schema reporting where no exception is raised if the XML is not valid according to the schema.
I added error_log properties to the RelaxNG and XMLSchema classes. That should solve that problem.
Another way that might be more consistent is to add new methods that either silently validate or, in case of validation errors, raise an exception.
Hmmm, I don't know. If that's only for retrieving more precise error information... Maybe a method like "assert" could be meaningful here. Stefan