Hi Python-Ideas ML,

To resume quickly the idea: I wish to add "extra" attribute to LogMessage, to facilitate structured logs generation.
For more details with use case and example, you can read message below.

Before to push the patch on bugs.python.org, I'm interested in by your opinions: the patch seems to be too simple to be honest.

Ludovic Gasc (GMLudo)

---------- Forwarded message ----------
From: Guido van Rossum <guido@python.org>
Date: 2015-05-24 23:44 GMT+02:00
Subject: Re: [Python-Dev] An yocto change proposal in logging module to simplify structured logs support
To: Ludovic Gasc <gmludo@gmail.com>

Ehh, python-ideas?

On Sun, May 24, 2015 at 10:22 AM, Ludovic Gasc <gmludo@gmail.com> wrote:

1. The problem

For now, when you want to write a log message, you concatenate the data from your context to generate a string: In fact, you convert your structured data to a string.
When a sysadmin needs to debug your logs when something is wrong, he must write regular expressions to extract interesting data.

Often, he must find the beginning of the interesting log and follow the path. Sometimes, you can have several requests in the same time in the log, it's harder to find interesting log.
In fact, with regular expressions, the sysadmin tries to convert the log lines strings to structured data.

2. A possible solution

You should provide a set of regular expressions to your sysadmins to help them to find the right logs, however, another approach is possible: structured logs.
Instead of to break your data structure to push in the log message, the idea is to keep the data structure, to attach that as metadata of the log message.
For now, I know at least Logstash and Journald that can handle structured logs and provide a query tool to extract easily logs.

3. A concrete example with structured logs

As most Web developers, we build HTTP daemons used by several different human clients in the same time.
In the Python source code, to support structured logs, you don't have a big change, you can use "extra" parameter for that, example:

    [handle HTTP request]
    LOG.debug('Receive a create_or_update request', extra={'request_id': request.request_id,
                                                                                                 'account_id': account_id,
                                                                                                 'aiohttp_request': request,
                                                                                                 'payload': str(payload)})
   [create data in database]
    LOG.debug('Callflow created', extra={'account_id': account_id,
                                             'request_id': request.request_id,
                                             'aiopg_cursor': cur,
                                             'results': row})

Now, if you want, you can enhance the structured log with a custom logging Handler, because the standard journald handler doesn't know how to handle aiohttp_request or aiopg_cursor.
My example is based on journald, but you can write an equivalent version with python-logstash:
from systemdream.journal.handler import JournalHandler

class Handler(JournalHandler):
    # Tip: on a system without journald, use socat to test:
    # socat UNIX-RECV:/run/systemd/journal/socket STDIN
    def emit(self, record):
        if record.extra:
            # import ipdb; ipdb.set_trace()
            if 'aiohttp_request' in record.extra:
                record.extra['http_method'] = record.extra['aiohttp_request'].method
                record.extra['http_path'] = record.extra['aiohttp_request'].path
                record.extra['http_headers'] = str(record.extra['aiohttp_request'].headers)
            if 'aiopg_cursor' in record.extra:
                record.extra['pg_query'] = record.extra['aiopg_cursor'].query.decode('utf-8')
                record.extra['pg_status_message'] = record.extra['aiopg_cursor'].statusmessage
                record.extra['pg_rows_count'] = record.extra['aiopg_cursor'].rowcount

And you can enable this custom handler in your logging config file like this:

And now, with journalctl, you can easily extract logs, some examples:
Logs messages from 'lg' account:
    journalctl ACCOUNT_ID=lg
All HTTP requests that modify the 'lg' account (PUT, POST and DELETE):
Retrieve all logs from one specific HTTP request:
    journalctl REQUEST_ID=130b8fa0-6576-43b6-a624-4a4265a2fbdd
All HTTP requests with a specific path:
    journalctl HTTP_PATH=/v1/accounts/lg/callflows
All logs of "create" function in the file "example.py"
   journalctl CODE_FUNC=create CODE_FILE=/path/example.py

If you already do a troubleshooting on a production system, you should understand the interest of this:
In fact, it's like to have SQL queries capabilities, but it's logging oriented.
We use that since a small time on one of our critical daemon that handles a lot of requests across several servers, it's already adopted from our support team.

4. The yocto issue with the Python logging module

I don't explain here a small part of my professional life for my pleasure, but to help you to understand the context and the usages, because my patch for logging is very small.
If you're an expert of Python logging, you already know that my Handler class example I provided above can't run on a classical Python logging, because LogRecord doesn't have an extra attribute.

extra parameter exists in the Logger, but, in the LogRecord, it's merged as attributes of LogRecord:

It means, that when the LogRecord is sent to the Handler, you can't retrieve the dict from the extra parameter of logger.
The only way to do that without to patch Python logging, is to rebuild by yourself the dict with a list of official attributes of LogRecord, as is done in python-logstash:
At least to me, it's a little bit dirty.

My quick'n'dirty patch I use for now on our CPython on production:

diff --git a/Lib/logging/__init__.py b/Lib/logging/__init__.py
index 104b0be..30fa6ef 100644
--- a/Lib/logging/__init__.py
+++ b/Lib/logging/__init__.py
@@ -1382,6 +1382,7 @@ class Logger(Filterer):
         rv = _logRecordFactory(name, level, fn, lno, msg, args, exc_info, func,
+        rv.extra = extra
         if extra is not None:
             for key in extra:
                 if (key in ["message", "asctime"]) or (key in rv.__dict__):

At least to me, it should be cleaner to add "extra" as parameter of _logRecordFactory, but I've no idea of side effects, I understand that logging module is critical, because it's used everywhere.
However, except with python-logstash, to my knowledge, extra parameter isn't massively used.
The only backward incompatibility I see with a new extra attribute of LogRecord, is that if you have a log like this:
    LOG.debug('message', extra={'extra': 'example'})
It will raise a KeyError("Attempt to overwrite 'extra' in LogRecord") exception, but, at least to me, the probability of this use case is near to 0.

Instead of to "maintain" this yocto patch, even it's very small, I should prefer to have a clean solution in Python directly.

Thanks for your remarks.

Ludovic Gasc (GMLudo)

Python-Dev mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org

--Guido van Rossum (python.org/~guido)