[Twisted-Python] the good, the log, and the ugly
Right now, we have a pretty bad logging idiom. All over Twisted, you'll find stuff like this: log.msg("twistd %s (%s %s) starting up." % (copyright.version, sys.executable, runtime.shortPythonVersion())) ... log.msg("Loading %s..." % filename) ... log.msg("Received unhandled keyID: %r" % (keyID,)) In other words, log messages that take structured information, but then smash it into a not-particularly-distinctive string and spit it out into a line-delimited file. The Twisted log system has, from its very beginning, supported a better style of formatting, which is: log.msg(format="twistd %(version)s (%(executable)s %(short)s) starting up.", version=copyright.version, executable=sys.executable, short=runtime.shortPythonVersion())) ... log.msg(format="Loading %(filename)s...", filename=filename) ... log.msg(format="Received unhandled keyID: %(keyID)r", keyID=keyID) There are lots of other features which this style of logging could support, as well, with relatively minor tweaks, such as trivial enhancements like: def logger(module, audience, importance): def logIt(**kw): log.msg(module=module, audience=audience, importance=importance, **kw) return logIt ... info = logger(__name__, ADMIN, NORMAL) bugreport = logger(__name__, DEVEL, HIGH) debug = logger(__name__, DEVEL, LOW) ... info(format="twistd %(version)s (%(executable)s %(short)s) starting up.", version=copyright.version, executable=sys.executable, short=runtime.shortPythonVersion())) This would then allow us to build systems which internally filter messages, avoid expensive string formatting and copying when it's not necessary, tune logging to different levels for different parts of the package hierarchy, and have intelligent real-time log analysis that inspects the objects being logged without having to apply regexes or other static-string-inspection methods to figure out which messages are interesting. Calendar Server implements some of these features for its logging system, and they've been very useful there. Various Divmod projects have implemented run-time statistics tracking via that Twisted logging system, and that worked pretty well too. It would be good to start taking advantage of the good parts of Twisted's logging system within Twisted itself. Doing this in a consistent way would also open up the possibility of localizing our log messages, which is currently not really plausible. And also, of logging in a format which lost less information; for example, instead of actually doing textToEventDict, simply dumping a simplified form of the event dict itself to some structured format that could be reliably parsed – something like JSON – for easier parsing and analysis later. However, in order to be able to take advantage of these features when they're added, we need to start avoiding premature string formatting everywhere that we can today. If you're writing log messages within twisted (or within a twisted application or library) please adopt the format=, x=, y= style of logging messages so that log observers can actually see into the objects being logged and do interesting things with them. We should also file some tickets about improving the documentation to indicate this style, for writing a new interface to logging which mandates this style, and for adding layers on top of that to provide more useful functionality like the aforementioned per-module or per-audience filtering. I have long despaired of filing such tickets because they each seem like boil-the-ocean tasks, changing every single call-site of log.msg (there are more than 500 as of today, just in Twisted itself, and thousands more in other applications I'm sure). I'm going to try to get started on record my desired behavior for logging as tickets soon though, so Twisted can at least have some really good logging tools eventually. -glyph
On 02:11 am, glyph@twistedmatrix.com wrote:
Right now, we have a pretty bad logging idiom. All over Twisted, you'll find stuff like this:
log.msg("twistd %s (%s %s) starting up." % (copyright.version, sys.executable, runtime.shortPythonVersion())) ... log.msg("Loading %s..." % filename) ... log.msg("Received unhandled keyID: %r" % (keyID,))
In other words, log messages that take structured information, but then smash it into a not-particularly-distinctive string and spit it out into a line-delimited file.
The Twisted log system has, from its very beginning, supported a better style of formatting, which is:
log.msg(format="twistd %(version)s (%(executable)s %(short)s) starting up.", version=copyright.version, executable=sys.executable, short=runtime.shortPythonVersion())) ... log.msg(format="Loading %(filename)s...", filename=filename) ... log.msg(format="Received unhandled keyID: %(keyID)r", keyID=keyID)
There are lots of other features which this style of logging could support, as well, with relatively minor tweaks, such as trivial enhancements like:
def logger(module, audience, importance): def logIt(**kw): log.msg(module=module, audience=audience, importance=importance, **kw) return logIt ... info = logger(__name__, ADMIN, NORMAL) bugreport = logger(__name__, DEVEL, HIGH) debug = logger(__name__, DEVEL, LOW) ... info(format="twistd %(version)s (%(executable)s %(short)s) starting up.", version=copyright.version, executable=sys.executable, short=runtime.shortPythonVersion()))
This would then allow us to build systems which internally filter messages, avoid expensive string formatting and copying when it's not necessary, tune logging to different levels for different parts of the package hierarchy, and have intelligent real-time log analysis that inspects the objects being logged without having to apply regexes or other static-string-inspection methods to figure out which messages are interesting. Calendar Server implements some of these features for its logging system, and they've been very useful there. Various Divmod projects have implemented run-time statistics tracking via that Twisted logging system, and that worked pretty well too. It would be good to start taking advantage of the good parts of Twisted's logging system within Twisted itself.
Doing this in a consistent way would also open up the possibility of localizing our log messages, which is currently not really plausible. And also, of logging in a format which lost less information; for example, instead of actually doing textToEventDict, simply dumping a simplified form of the event dict itself to some structured format that could be reliably parsed – something like JSON – for easier parsing and analysis later.
However, in order to be able to take advantage of these features when they're added, we need to start avoiding premature string formatting everywhere that we can today. If you're writing log messages within twisted (or within a twisted application or library) please adopt the format=, x=, y= style of logging messages so that log observers can actually see into the objects being logged and do interesting things with them.
We should also file some tickets about improving the documentation to indicate this style, for writing a new interface to logging which mandates this style, and for adding layers on top of that to provide more useful functionality like the aforementioned per-module or per- audience filtering. I have long despaired of filing such tickets because they each seem like boil-the-ocean tasks, changing every single call-site of log.msg (there are more than 500 as of today, just in Twisted itself, and thousands more in other applications I'm sure).
I'm going to try to get started on record my desired behavior for logging as tickets soon though, so Twisted can at least have some really good logging tools eventually.
I agree with just about everything here, except I think we should be able to do better than the `format="opaque string"´ part. Logging in the style you outlined here preserves the structure of the information, but it omits any consistent way to identify what the structure is. It seems to leave only two possibilities: - Compare the format string to certain well-known format strings and use a match (if found) to make decisions based on a priori knowledge about the additional items in the log event. - Compare the set of keys in the log event to certain well-known sets and use a match there to make the same kind of decision. Neither of these sounds like a good thing. Divmod experimented with marking log events with interfaces that documented the structure and semantics of the log event carrying them. This didn't get very far, and perhaps had problems of its own, but even it seemed preferable to either of the above options. Jean-Paul
-glyph
On Mar 26, 2012, at 10:38 AM, exarkun@twistedmatrix.com wrote:
I agree with just about everything here, except I think we should be able to do better than the `format="opaque string"´ part.
In principle I agree, but I'm not sure about how to do that _exactly now_. Whereas today, immediately, all log messages could be written in the style I suggested and it would be a big improvement.
Logging in the style you outlined here preserves the structure of the information, but it omits any consistent way to identify what the structure is. It seems to leave only two possibilities:
- Compare the format string to certain well-known format strings and use a match (if found) to make decisions based on a priori knowledge about the additional items in the log event.
Previous (limited) experience with internationalization efforts lead me to believe that looking at well-known format strings to determine meaning is the state of the art.
- Compare the set of keys in the log event to certain well-known sets and use a match there to make the same kind of decision.
Neither of these sounds like a good thing.
Still, a pretty substantial improvement over running a regex over a string in a quasi-structured log file after the fact (IMHO).
Divmod experimented with marking log events with interfaces that documented the structure and semantics of the log event carrying them. This didn't get very far, and perhaps had problems of its own, but even it seemed preferable to either of the above options.
What I'd really like to do is to have some kind of log object built up like this: logger = Logger(module=__name__) class DebugFoo(logger.message()): """wouldn't it be great if such messages were explained in detail""" audience=developers; importance=medium; format="foo %(foo)d happened"; logID=UUID('7dc1c45b-4eb7-4a3f-b751-ee6b500f11ce') anythingElseInteresting="otherstuff" def onFoo(someFoo): DebugFoo.emit(foo=7) and while this isn't hard to implement, it's also not implemented already, but the format= stuff is. So my suggestion was not intended to say we should stop here, but rather, while we're figuring out better stuff to do, nobody should ever write a 'log.msg("some static " + thing)' again :). -glyph
On 3/25/12 7:11 PM, Glyph wrote:
I'm going to try to get started on record my desired behavior for logging as tickets soon though, so Twisted can at least have some really good logging tools /eventually/.
While you're thinking about this space, I'll plug the logging tools I built into Foolscap (and applications that use it, like Tahoe). I'm sure many others have built things like it, but some of the features that I've found particularly useful in Tahoe development are: * structured log messages, stored as dicts with non-serialized arguments * "umids": unique message identifiers, to find the call site * size-limited per-(priority*section) circular event buffers * parent/child event relationships, for event trees * "Incidents": when an UNUSUAL event is logged, a pickle of the event buffers is saved to disk, for later inspection, including events from post-trigger error-recovery code * "logport": Foolscap interface for subscribing to event streams * "flogtool tail" to connect to remote app and watch the event stream * web- or text- based event renderers, with expand/hide-subtree buttons * "incident gatherer": subscribes to Incident reports from multiple remote apps, classifies reports them according to cause It's inspired by twisted.python.log, of course, and the log.msg() interface should look pretty familiar. The Foolscap buy-in may be too much for most apps, but it'd be interesting to find a subset that doesn't require the networking parts. I tried to figure out how to make it fit with Python's stdlib logging module, but eventually gave up (I think the biggest issue was needing to return an event number from the log.msg() call, for the parent/child stuff). http://foolscap.lothar.com/docs/logging.html has details. cheers, -Brian
On Mar 30, 2012, at 3:23 PM, Brian Warner wrote:
On 3/25/12 7:11 PM, Glyph wrote:
I'm going to try to get started on record my desired behavior for logging as tickets soon though, so Twisted can at least have some really good logging tools /eventually/.
While you're thinking about this space, I'll plug the logging tools I built into Foolscap (and applications that use it, like Tahoe). I'm sure many others have built things like it, but some of the features that I've found particularly useful in Tahoe development are:
* structured log messages, stored as dicts with non-serialized arguments * "umids": unique message identifiers, to find the call site * size-limited per-(priority*section) circular event buffers * parent/child event relationships, for event trees * "Incidents": when an UNUSUAL event is logged, a pickle of the event buffers is saved to disk, for later inspection, including events from post-trigger error-recovery code * "logport": Foolscap interface for subscribing to event streams * "flogtool tail" to connect to remote app and watch the event stream * web- or text- based event renderers, with expand/hide-subtree buttons * "incident gatherer": subscribes to Incident reports from multiple remote apps, classifies reports them according to cause
It's inspired by twisted.python.log, of course, and the log.msg() interface should look pretty familiar. The Foolscap buy-in may be too much for most apps, but it'd be interesting to find a subset that doesn't require the networking parts. I tried to figure out how to make it fit with Python's stdlib logging module, but eventually gave up (I think the biggest issue was needing to return an event number from the log.msg() call, for the parent/child stuff).
http://foolscap.lothar.com/docs/logging.html has details.
Indeed, Foolscap has informed many of my future wishes for Twisted logging, especially its incident reporting and logport features. (I wasn't aware of UMIDs but I have independently invented the same thing.) Could we perhaps get some of foolscap logging's core features into Twisted? We could have an AMP interface for subscribing to log streams that would be slightly more low-fidelity (but, I would guess, higher-performance) than the Foolscap one, but work based on the same core mechanism which foolscap could then subscribe to. I suspect that this would be useful to foolscap as well, because then the code that foolscap calls into (for example, the web stuff for its UIs) would be using the same logging convention. -glyph
On 3/30/12 5:49 PM, Glyph wrote:
http://foolscap.lothar.com/docs/logging.html has details.
Indeed, Foolscap has informed many of my future wishes for Twisted logging, especially its incident reporting and logport features. (I wasn't aware of UMIDs but I have independently invented the same thing.)
(actually, the "UMID" support consists of ignoring extra kwargs, and an emacs macro which inserts umid="RANDOM" at point. It works remarkably well, and is a lot cheaper than having log.msg() capture the call stack so you can figure out which copy of the otherwise-anonymous log.msg("done") calls was invoked this time)
Could we perhaps get some of foolscap logging's core features into Twisted? We could have an AMP interface for subscribing to log streams that would be slightly more low-fidelity (but, I would guess, higher-performance) than the Foolscap one, but work based on the same core mechanism which foolscap could then subscribe to.
Yeah, that'd be great. An AMP interface sounds lovely. I'd have to think about potential gotchas, but I'm sure we can work them out. Foolscap's remote protocol relies upon all the log.msg() arguments being foolscap-serializable, which occasionally reveals places where we're accidentally passing object instances into the log. (it uses CopiedFailure a lot, so we'd need to come up with an AMP equivalent). The stored-on-disk format just uses pickle, which I kinda regret, but it should be pretty easy to replace that with something safer. There's also the notion of a "log port" (a foolscap object with a remote_subscribe() method). Does AMP have any notion of object references? If not, I can see how you could call in and ask (synchronously) for the current event log, but not how you'd provide a "callback object" to which future events should be streamed. How does pubsub work in AMP?
I suspect that this would be useful to foolscap as well, because then the code that foolscap calls into (for example, the web stuff for its UIs) would be using the same logging convention.
Yeah, I'd love to get that factored out and let other people hack on it. My web UI skills are laughable. cheers, -Brian
On Mar 31, 2012, at 6:10 PM, Brian Warner wrote:
On 3/30/12 5:49 PM, Glyph wrote:
http://foolscap.lothar.com/docs/logging.html has details.
Indeed, Foolscap has informed many of my future wishes for Twisted logging, especially its incident reporting and logport features. (I wasn't aware of UMIDs but I have independently invented the same thing.)
(actually, the "UMID" support consists of ignoring extra kwargs, and an emacs macro which inserts umid="RANDOM" at point. It works remarkably well, and is a lot cheaper than having log.msg() capture the call stack so you can figure out which copy of the otherwise-anonymous log.msg("done") calls was invoked this time)
I was thinking more along the lines of something that would insert the output of "python -c 'import uuid, sys; sys.stdout.write(str(uuid.uuid4()))'", but yeah, same general idea. (Why do you have to ignore extra kwargs?)
Could we perhaps get some of foolscap logging's core features into Twisted? We could have an AMP interface for subscribing to log streams that would be slightly more low-fidelity (but, I would guess, higher-performance) than the Foolscap one, but work based on the same core mechanism which foolscap could then subscribe to.
Yeah, that'd be great. An AMP interface sounds lovely. I'd have to think about potential gotchas, but I'm sure we can work them out. Foolscap's remote protocol relies upon all the log.msg() arguments being foolscap-serializable, which occasionally reveals places where we're accidentally passing object instances into the log. (it uses CopiedFailure a lot, so we'd need to come up with an AMP equivalent). The stored-on-disk format just uses pickle, which I kinda regret, but it should be pretty easy to replace that with something safer.
I was thinking that AMP would offer an AnyOf() argument type, which would be dynamic within a small set, with a fallback. Within the context of logging, we'd support unicode/bytes/float/int (and maybe some more esoteric numeric tower stuff too, like decimals and fractions), and possibly lists and dicts of those same types (but not structures of structures, so we don't have to worry about backreferences). Maybe tuples supported the same as lists. The fallback would simply be "repr() this thing", so if you log an object you still get something somewhat useful without potentially arbitrary serialization. Foolscap could take the same data and just make a
There's also the notion of a "log port" (a foolscap object with a remote_subscribe() method). Does AMP have any notion of object references?
No, but I don't think we need such a notion. If we did need it for some reason, AMP does have a notion of establishing new message streams over the same connection. This message finally prompted me to file the ticket that I always think was already filed: <http://twistedmatrix.com/trac/ticket/5587>.
If not, I can see how you could call in and ask (synchronously) for the current event log, but not how you'd provide a "callback object" to which future events should be streamed. How does pubsub work in AMP?
You just know that your peer knows how to respond to certain commands, and you send those commands. For example, from twisted.protocols.amp import Command, String, AMP class Pub(Command): arguments = [("channel", String()), ("event", String())] class Sub(Command): arguments = [("channel", String())] class Hub(AMP): @Sub.responder def sub(self, channel): self.factory.observers[channel].append(self) @Pub.responder def pub(self, channel, event): for observer in self.factory.observers[channel]: observer.callRemote(Pub, channel=channel, event=event) Then your peer just has the same definition of Pub and has a @Pub.responder that does whatever they like. (Eric Mangold was working on formalizing this process somewhat at the Twisted sprint.)
I suspect that this would be useful to foolscap as well, because then the code that foolscap calls into (for example, the web stuff for its UIs) would be using the same logging convention.
Yeah, I'd love to get that factored out and let other people hack on it. My web UI skills are laughable.
Actually I was saying that Twisted could provide the core functionality and Foolscap could still provide the web UI - but if you wanted to break things out even further, but I suppose this would also present an opportunity for some third party to plug in some web UI on the side of Twisted that was not necessarily related to Foolscap, which would be cool too. -glyph
participants (3)
-
Brian Warner
-
exarkun@twistedmatrix.com
-
Glyph