Proposed additional keyword argument in logging calls
On numerous occasions, requests have been made for the ability to easily add user-defined data to logging events. For example, a multi-threaded server application may want to output specific information to a particular server thread (e.g. the identity of the client, specific protocol options for the client connection, etc.) This is currently possible, but you have to subclass the Logger class and override its makeRecord method to put custom attributes in the LogRecord. These can then be output using a customised format string containing e.g. "%(foo)s %(bar)d". The approach is usable but requires more work than necessary. I'd like to propose a simpler way of achieving the same result, which requires use of an additional optional keyword argument in logging calls. The signature of the (internal) Logger._log method would change from def _log(self, level, msg, args, exc_info=None) to def _log(self, level, msg, args, exc_info=None, extra_info=None) The extra_info argument will be passed to Logger.makeRecord, whose signature will change from def makeRecord(self, name, level, fn, lno, msg, args, exc_info): to def makeRecord(self, name, level, fn, lno, msg, args, exc_info, extra_info) makeRecord will, after doing what it does now, use the extra_info argument as follows: If type(extra_info) != types.DictType, it will be ignored. Otherwise, any entries in extra_info whose keys are not already in the LogRecord's __dict__ will be added to the LogRecord's __dict__. Can anyone see any problems with this approach? If not, I propose to post the approach on python-list and then if there are no strong objections, check it in to the trunk. (Since it could break existing code, I'm assuming (please correct me if I'm wrong) that it shouldn't go into the release24-maint branch.) Of course, if anyone can suggest a better way of doing it, I'm all ears :-) Regards, Vinay Sajip
On 11/22/05, Vinay Sajip
makeRecord will, after doing what it does now, use the extra_info argument as follows:
If type(extra_info) != types.DictType, it will be ignored.
Otherwise, any entries in extra_info whose keys are not already in the LogRecord's __dict__ will be added to the LogRecord's __dict__.
Can anyone see any problems with this approach?
I'd suggest that you raise an error if extra_info doesn't act like a dictionary - probably, just try to add its entries and let any error pass back to the caller. You definitely want to allow dict subclasses, and anything that acts like a dictionary. And you want to catch errors like log(..., extra_info = "whatever") with a format of "... %(extra_info)s..." (ie, assuming that extra_info is a single value - it's what I expected you to propose when I started reading). The rest looks good (I don't have a need for it myself, but it looks like a nice, clean solution to the problem you describe). Paul.
On 11/22/05, Vinay Sajip
On numerous occasions, requests have been made for the ability to easily add user-defined data to logging events. For example, a multi-threaded server application may want to output specific information to a particular server thread (e.g. the identity of the client, specific protocol options for the client connection, etc.)
This is currently possible, but you have to subclass the Logger class and override its makeRecord method to put custom attributes in the LogRecord. These can then be output using a customised format string containing e.g. "%(foo)s %(bar)d". The approach is usable but requires more work than necessary.
I'd like to propose a simpler way of achieving the same result, which requires use of an additional optional keyword argument in logging calls. The signature of the (internal) Logger._log method would change from
def _log(self, level, msg, args, exc_info=None)
to
def _log(self, level, msg, args, exc_info=None, extra_info=None)
The extra_info argument will be passed to Logger.makeRecord, whose signature will change from
def makeRecord(self, name, level, fn, lno, msg, args, exc_info):
to
def makeRecord(self, name, level, fn, lno, msg, args, exc_info, extra_info)
makeRecord will, after doing what it does now, use the extra_info argument as follows:
If type(extra_info) != types.DictType, it will be ignored.
Otherwise, any entries in extra_info whose keys are not already in the LogRecord's __dict__ will be added to the LogRecord's __dict__.
Can anyone see any problems with this approach? If not, I propose to post the approach on python-list and then if there are no strong objections, check it in to the trunk. (Since it could break existing code, I'm assuming (please correct me if I'm wrong) that it shouldn't go into the release24-maint branch.)
This looks like a good clean solution to me. I agree with Paul Moore's suggestion that if extra_info is not None you should just go ahead and use it as a dict and let the errors propagate. What's the rationale for not letting it override existing fields? (There may be a good one, I just don't see it without turning on my thinking cap, which would cost extra. :-) Perhaps it makes sense to call it 'extra' instead of 'extra_info'? As a new feature it should definitely not go into 2.4; but I don't see how it could break existing code. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum
This looks like a good clean solution to me. I agree with Paul Moore's suggestion that if extra_info is not None you should just go ahead and use it as a dict and let the errors propagate.
OK.
What's the rationale for not letting it override existing fields? (There may be a good one, I just don't see it without turning on my thinking cap, which would cost extra.
The existing fields which could be overwritten are ones which have been computed by the logging package itself: name Name of the logger levelno Numeric logging level for the message (DEBUG, INFO, WARNING, ERROR, CRITICAL) levelname Text logging level for the message ("DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL") msg The message passed in the logging call args The additional args passed in the logging call exc_info Exception information (from sys.exc_info()) exc_text Exception text (cached for use by multiple handlers) pathname Full pathname of the source file where the logging call was issued (if available) filename Filename portion of pathname module Module (name portion of filename) lineno Source line number where the logging call was issued (if available) created Time when the LogRecord was created (time.time() return value) msecs Millisecond portion of the creation time relativeCreated Time in milliseconds when the LogRecord was created, relative to the time the logging module was loaded (typically at application startup time) thread Thread ID (if available) process Process ID (if available) message The result of record.getMessage(), computed just as the record is emitted I couldn't think of a good reason why it should be possible to overwrite these values with values from a user-supplied dictionary, other than to spoof log entries in some way. The intention is to stop a user accidentally overwriting one of the above attributes. But thinking about "Errors should never pass silently", I propose that an exception (KeyError seems most appropriate, though here it would be because a key was present rather than absent) be thrown if one of the above attribute names is supplied as a key in the user-supplied dict.
Perhaps it makes sense to call it 'extra' instead of 'extra_info'?
Fine - 'extra' it will be.
As a new feature it should definitely not go into 2.4; but I don't see how it could break existing code.
OK - thanks for the feedback. Regards, Vinay Sajip
Vinay> I couldn't think of a good reason why it should be possible to Vinay> overwrite these values with values from a user-supplied Vinay> dictionary, other than to spoof log entries in some way. If the user doesn't need those values and can provide cheap substitutes, perhaps their computation can be avoided. I did that recently by inlining only the parts of logging.LogRecord.__init__ in a subclass and avoided calling logging.LogRecord.__init__ altogether. It generated lots of instance variables we never use and just slowed things down. Skip
If the user doesn't need those values and can provide cheap substitutes, perhaps their computation can be avoided. I did that recently by inlining only the parts of logging.LogRecord.__init__ in a subclass and avoided calling logging.LogRecord.__init__ altogether. It generated lots of instance variables we never use and just slowed things down.
Well, that approach is fine if you need maximal performance. If those values set in LogRecord.__init__ can be computed more cheaply than they are at present, please give an idea as to where speedups can be applied, or even supply a patch ;-) If you don't need many of the values set by LogRecord.__init__, then your approach is fine (until someone decides they want something which is documented and tries to access it via a specific format string - which could be done, in the general case, via configuration changes). The use case I'm supporting with this is one where some/most/all of the default attributes are wanted, and also additional user-defined ones in some special cases. With the proposed scheme the extra value is passed to LogRecord constuction code so it could be used by your custom LogRecord initialiser with the same benefit as in the general case (I think). Regards, Vinay Sajip
I'm working on <https://sourceforge.net/tracker/index.php? func=detail&aid=1239890&group_id=5470&atid=305470> My patch uses os.path.normpath to simplify the path sent to makedirs as first step. This eliminates the need to check for os.currdir or other special case, because normpath already handle all those ugly spacial cases (and hopefully tested). And of course eliminate possible pointless system calls. For example (on Mac OS X):
os.path.normpath('a/./////b/c') 'a/b/c' os.path.normpath('a/b/c/.') 'a/b/c' os.path.normpath('./a/b') 'a/b' os.path.normpath('a/b/////') 'a/b'
However, I found this alarming note in the docs: normpath(path) ... "It should be understood that this may change the meaning of the path if it contains symbolic links!" The function docstring does not contain this note: """Normalize path, eliminating double slashes, etc.""" And finally, there is no test for this symbolic link problem. Anyone has a clue about this? Best Regards, Nir Soffer
Nir Soffer wrote:
However, I found this alarming note in the docs:
normpath(path) ... "It should be understood that this may change the meaning of the path if it contains symbolic links!"
The function docstring does not contain this note:
"""Normalize path, eliminating double slashes, etc."""
And finally, there is no test for this symbolic link problem.
Anyone has a clue about this?
if BAR is a symbolic link, FOO/BAR/../DIR isn't necessarily the same thing as FOO/DIR. a simple example: $ ln -s /etc etc $ ls -ld etc lrwxrwxrwx 1 fredrik fredrik 4 Dec 2 23:22 etc -> /etc $ etc/../usr/local/bin/python2.4 Python 2.4.1 (#1, Sep 12 2005, 19:35:02) ... >>> import os >>> os.path.normpath("etc/../usr/local/bin/python2.4") 'usr/local/bin/python2.4' >>> <control-D> $ usr/local/bin/python2.4 -bash: usr/local/bin/python2.4: No such file or directory </F>
Consider: $ mkdir -p d/d/d $ echo 1 > d/d/a $ echo 2 > a $ ln -s d/d/d x $ python -c 'import os; print open(os.path.normpath("x/../a")).read(),' 2 $ cat x/../a 1 Jeff
>> If the user doesn't need those values and can provide cheap >> substitutes, perhaps their computation can be avoided. I did that >> recently by inlining only the parts of logging.LogRecord.__init__ in >> a subclass and avoided calling logging.LogRecord.__init__ altogether. >> It generated lots of instance variables we never use and just slowed >> things down. Vinay> Well, that approach is fine if you need maximal performance. If Vinay> those values set in LogRecord.__init__ can be computed more Vinay> cheaply than they are at present, please give an idea as to where Vinay> speedups can be applied, or even supply a patch ;-) Note the prefix "If the user doesn't need those values". I don't need much of the stuff that LogRecord.__init__ computes for each LogRecord instantiation, so None would be a fine, cheap substitute for me. <wink> In general we do a lot of logging at work because stuff happens so fast (we're writing automated trading applications) that you can't tell what's happening in real-time. Your only opportunity for analysis is post-mortem. Consequently, we tend to log a bunch of stuff that is rarely needed, but is critical to have around when it is needed. In fact, the folks who write C++ apps here tossed out log4cpp altogether and wrote their own logging package because log4cpp was too slow. To give you an idea how much logging we do, I wrote a CompressedRotatingFileHandler class because our apps log so much data that we need to compress them to avoid filling up disks. Vinay> If you don't need many of the values set by LogRecord.__init__, Vinay> then your approach is fine (until someone decides they want Vinay> something which is documented and tries to access it via a Vinay> specific format string - which could be done, in the general Vinay> case, via configuration changes). This is used within a single app, so I'm not too worried about that. All we ever call the log routines with is a caller-formatted string. There is no substitution required of the logging package. Most of the stuff you provide we simply don't need. Like I said, we log a lot of stuff. When our application's main CPU user is the logging package it's time to look at the logging package with an eye to cutting out what we don't need. Vinay> The use case I'm supporting with this is one where some/most/all Vinay> of the default attributes are wanted, I would argue the stuff I've thrown out isn't used by most apps. Here's what I inlined: self.name = name self.msg = msg self.args = args self.levelname = logging.getLevelName(level) self.levelno = level self.pathname = pathname self.filename = os.path.basename(pathname) self.module = os.path.splitext(self.filename)[0] self.exc_info = exc_info self.lineno = lineno self.created is either set to time.time() (when we run live) or our internal tick stream clock time (when we do historical runs). Here's what I avoided setting: self.thread - most apps, ours included, are not multithreaded, so knowing the current thread id is superfluous try/except on self.filename/self.module - not sure why this is necessary or what the except clause is catching - I've never seen an exception self.relativeCreated - we only use absolute time self.process - we never to record the process id as each process generates its own log files. Plus I of course avoided the overhead of the __init__ function call itself. Vinay> With the proposed scheme the extra value is passed to LogRecord Vinay> constuction code so it could be used by your custom LogRecord Vinay> initialiser with the same benefit as in the general case (I Vinay> think). Sure, I'd be happy to use extra dict if it suppressed the superfluous calculations, e.g.: if 'relativeCreated' in extra: self.relativeCreated = extra['relativeCreated'] else: self.relativeCreated = (self.created - _startTime) * 1000 Maybe adding all those tests will just slow down the common case though. Perhaps extra should be used to override *all* logRecord attributes: if extra: self.__dict__.update(extra) else: ... do the usual initialization ... In that case it would be the caller's responsibility to make sure that extra contains everything they need. To insure that, a static LogRecord method could be used to return a default extras dict: extras = LogRecord.get_default_extra() while True: ... compute ... ... maybe override some values in extras ... logging.info(..., extra=extras) ... compute some more ... Skip
Skip, Thanks for the detailed post indicating what stuff you find useful, and what stuff you don't need. It seems that your use case is fairly unusual, so I completely understand that you have optimized how you use logging. I will consider how to try to incorporate your feedback without breaking backwards compatibility, but it will probably not be via the 'extra' mechanism I proposed. For example, to avoid thread and process computation, I could introduce module-level variables eg. log_threads and log_processes, both set to true to maintain backwards compatibility, but settable to false to avoid expensive thread and process computations where not needed. The 'extra' mechanism will remain to provide additional diagnostic information where e.g. the same code is executed by multiple threads and there is a need to distinguish the different threads in the logging output. Regards, Vinay Sajip
On 11/29/05, Vinay Sajip
But thinking about "Errors should never pass silently", I propose that an exception (KeyError seems most appropriate, though here it would be because a key was present rather than absent) be thrown if one of the above attribute names is supplied as a key in the user-supplied dict.
+1 -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Vinay Sajip
The existing fields which could be overwritten are ones which have been computed by the logging package itself:
name Name of the logger levelno Numeric logging level for the message (DEBUG, INFO, WARNING, ERROR, CRITICAL) [and so on].
Shouldn't this list be documented? Or is it? Thomas
Thomas Heller
by the logging package itself:
name Name of the logger levelno Numeric logging level for the message (DEBUG, INFO, WARNING, ERROR, CRITICAL) [and so on].
Shouldn't this list be documented? Or is it?
The list of values which can be used in format strings is: http://docs.python.org/lib/node357.html The list of internal values is not (e.g. msg, args, exc_text) and I suppose the documentation could be updated to include them, but they're implementation details so I don't really want to encourage changing them. If you need to, it's easy enough to pick up the info from the source code for LogRecord.__init__(), which does all the setup. Regards, Vinay Sajip
participants (9)
-
Fredrik Lundh
-
Guido van Rossum
-
jepler@unpythonic.net
-
Nir Soffer
-
Paul Moore
-
skip@pobox.com
-
Thomas Heller
-
Vinay Sajip
-
Vinay Sajip