
On Fri, Jun 14, 2013 at 6:40 PM, Glyph <glyph@twistedmatrix.com> wrote:
On Jun 14, 2013, at 3:43 PM, Glyph <glyph@twistedmatrix.com> wrote:
I think that we can satisfy all of these requirements by expanding the notion of an "identifier" a bit.
In our last episode, I mentioned that an identifier should have several properties:
human readable (and potentially non-unique) for ease of writing messages and reading them to get a general sense of the behavior of a system globally unique (and potentially hard to read) for ease of filtering and monitoring long-term (and potentially unique to humans) for ease of writing tools to deal with particular messages, whose exact details may change over time
I think we can achieve all of these things by making there be multiple ways to identify a log message. Here's how that could work:
When you are first writing a log message, just do this, as you usually would; toss off some crummy english message:
log.info("hey check it out I just {verb}ed a {noun}", verb=request.method, noun=request.uri)
Asking the author to write a log message with more detail and care than this is basically just a recipe for disappointment :).
Later, someone might want to instrument these messages. We've all agreed that re.findall("hey check it out.*") is a crummy way to do this. The obvious thing would be something like this:
tapEventsLike(myObserver, publisher, log_format="hey check it out I just {verb}ed a {noun}")
That's fine to start with, of course, but it's not very resilient to changes in the code; the developer might later change this line to read:
log.info("hey check it out I just {verb}ed a {noun} and got a {result}", verb=request.method, noun=request.uri, result=request.response.status)
Now the log tap is broken, and there's no reasonable way to fix it; the format string doesn't have a place that it could go.
But, the 'log' in the above example is a logger object associated with a class or module, and so has 3 persistent pieces of information it can use to build a long-term identifier that the event tap can use. We could reflect that in an API like this:
from example.module import log tapEventsLike(myObserver, publisher, log_id=log.idForOriginalFormat("hey check it out I just {verb}ed a {noun}"))
which could be computed offline, and then specified like this, to be even more resilient to API changes:
tapEventsLike(myObserver, publisher, log_id='604e45b5-2d41-5ba5-9ae1-7205163c3e3f')
That string, of course, is computed like this:
import uuid from urllib import urlencode uri = "http://twistedmatrix.com/ns/log/?" + urlencode(dict(format="hey check it out I just {verb}ed a {noun}", module="example.module")) uuid.uuid5(uuid.NAMESPACE_URL, uri) UUID('604e45b5-2d41-5ba5-9ae1-7205163c3e3f')
(Although if we decide we like this system, we could of course go with a helpful command-line tool to generate these for you.)
As you evolve your codebase to take logging more seriously, you could change the original 'log.info' call to instead look like this:
verbANoun = log.declare(
"hey check it out I just {verb}ed a {noun} and got a {result}",
log_id='604e45b5-2d41-5ba5-9ae1-7205163c3e3f',
)
# ...
verbANoun(verb=request.method, noun=request.uri, result=request.response.status)
and then monitoring can be accomplished via public API:
from example.module import verbANoun verbANoun.tap(myObserver, theLogPublisher)
So, after some evolution, our message has 3 identifiers:
the original format string, which was easy to come up with on the spot, and need not be unique a UUID, that was automatically generated but can be carried around to allow for cleaning up the codebase and the message without breaking any code doing monitoring an API name that the logger and the logee can use to agree on the semantics of a particular message
Best of all, you can get this even for the plethora of existing log messages within Twisted that aren't even using structured values, as we evolve them to have more metadata. You also now have the ability to localize the format string, if you want to start presenting these log messages in contexts other than log files for system administrators.
Thoughts?
There's... a lot here. I'm overwhelmed by how complex this system would be, and I think that end users would be as well. I don't really want to put UUIDs into my source code, whether I type them or paste them, and I don't want to have to invoke command line tools to figure out what those UUIDs in order to be a good developer. I think there's basically no practical benefit over the hierarchical "system" + event-identifier system, where: 1. it's trivial to specify a hierarchical, easy-to-read "system" key that has a small-ish scope 2. you think of a unique event name in that small-ish scope and type it out. I'm beginning to think #1 is the most important of all, but I think I'll continue to use event-names to describe all my log statements. As I said in my response to Wilfredo, I think it's really important to stop specifying the "system" for application code in Twisted. The application's code should not be considered a part of the HTTP protocol's system. There are also other problems with the way we use system, like putting extra random data like request ID in there. -- Christopher Armstrong http://radix.twistedmatrix.com/ http://planet-if.com/