On Jun 14, 2013, at 3:43 PM, Glyph <glyph@twistedmatrix.com> wrote:

I think that we can satisfy all of these requirements by expanding the notion of an "identifier" a bit.

In our last episode, I mentioned that an identifier should have several properties:

  1. human readable (and potentially non-unique) for ease of writing messages and reading them to get a general sense of the behavior of a system
  2. globally unique (and potentially hard to read) for ease of filtering and monitoring
  3. long-term (and potentially unique to humans) for ease of writing tools to deal with particular messages, whose exact details may change over time

I think we can achieve all of these things by making there be multiple ways to identify a log message.  Here's how that could work:

When you are first writing a log message, just do this, as you usually would; toss off some crummy english message:

log.info("hey check it out I just {verb}ed a {noun}", verb=request.method, noun=request.uri)

Asking the author to write a log message with more detail and care than this is basically just a recipe for disappointment :).

Later, someone might want to instrument these messages.  We've all agreed that re.findall("hey check it out.*") is a crummy way to do this.  The obvious thing would be something like this:

tapEventsLike(myObserver, publisher, log_format="hey check it out I just {verb}ed a {noun}")

That's fine to start with, of course, but it's not very resilient to changes in the code; the developer might later change this line to read:

log.info("hey check it out I just {verb}ed a {noun} and got a {result}", verb=request.method, noun=request.uri, result=request.response.status)

Now the log tap is broken, and there's no reasonable way to fix it; the format string doesn't have a place that it could go.

But, the 'log' in the above example is a logger object associated with a class or module, and so has 3 persistent pieces of information it can use to build a long-term identifier that the event tap can use.  We could reflect that in an API like this:

from example.module import log
tapEventsLike(myObserver, publisher,
              log_id=log.idForOriginalFormat("hey check it out I just {verb}ed a {noun}"))

which could be computed offline, and then specified like this, to be even more resilient to API changes:

tapEventsLike(myObserver, publisher, log_id='604e45b5-2d41-5ba5-9ae1-7205163c3e3f')

That string, of course, is computed like this:

>>> import uuid
>>> from urllib import urlencode
>>> uri = "http://twistedmatrix.com/ns/log/?" + urlencode(dict(format="hey check it out I just {verb}ed a {noun}", module="example.module"))
>>> uuid.uuid5(uuid.NAMESPACE_URL, uri)

(Although if we decide we like this system, we could of course go with a helpful command-line tool to generate these for you.)

As you evolve your codebase to take logging more seriously, you could change the original 'log.info' call to instead look like this:

verbANoun = log.declare(
    "hey check it out I just {verb}ed a {noun} and got a {result}",
# ...
verbANoun(verb=request.method, noun=request.uri, result=request.response.status)

and then monitoring can be accomplished via public API:

from example.module import verbANoun
verbANoun.tap(myObserver, theLogPublisher)

So, after some evolution, our message has 3 identifiers:

Best of all, you can get this even for the plethora of existing log messages within Twisted that aren't even using structured values, as we evolve them to have more metadata.  You also now have the ability to localize the format string, if you want to start presenting these log messages in contexts other than log files for system administrators.