[New-bugs-announce] [issue33510] [logging] add JSON log formatter

Steve R. Hastings report at bugs.python.org
Mon May 14 18:20:20 EDT 2018


New submission from Steve R. Hastings <steveha at users.sourceforge.net>:

Many logging aggregation and indexing systems require that log data must be written in an easy-to-parse format such as JSON.  A "batteries included" JSON log formatter would be a useful enhancement.

I have written code with the following features:

Log events are translated into JSON.  The JSON is formatted without indentation, so that one log event will produce one line in the output file.  Each line is a single JSON object.  Because log files contain many lines and JSON logging is usually not read by humans, the JSON separator punctuation are set to simply ':' and ',' (no leading or trailing spaces).

The standard fields of a log event turn into key/value pairs in the JSON object, using the standard field name as the output JSON key name.  For example, a log statement like this: log.info("foo initialized") would result in JSON that included: "message": "foo initialized", "levelname": "INFO"

By default, the JSON output has the timestamp as the first key/value pair; therefore if the file is sorted, or multiple log files are concatenated and then sorted, the logged events will be sorted into elapsed order.  By default, the timestamp format is RFC3339: https://tools.ietf.org/html/rfc3339  Also, by default, the seconds field will have a fractional part with three digits (i.e. millisecond resolution on log event timestamps).

The user can provide a dictionary that will translate key names.  If the user's log collection system requires the logging level to have a key name of "priority", then by specifying that "levelname" should be mapped to "priority" the log event will be output with "priority": "INFO"  This dictionary also provides a way to control which logging event fields are written to the logfile and in what order.

Taking advantage of a feature already present in the Python logging code: The user can provide a dictionary as the only argument after the logging message, and it will be handled specially.  For the JSON formatter, the special handling is that the key/value pairs from the dictionary will be included in the JSON output.  Alternatively, the user can provide a message string that includes % templating codes, and specify arguments to be templated in, as is common practice in logging.  (This is to be backwards-compatible with code written to the current formatters.)

The Python logging code feature mentioned above can be found in LogRecord.__init__()  The first thing that init code does is check whether there is exactly one argument and it is a non-empty Mapping.

In the case where there are multiple arguments provided and templated into the message string, one extra key/value pair is written in the JSON object: the key is "args" and the value is a JSON array with the values of the arguments.

Because logging is fundamental, if anything goes wrong it's important to log as much as possible.  If any exception is raised during the call to json.dumps() that creates an output line, there is fallback code that converts the arguments dictionary to a string and logs that string under an identifiable key name such as "args_text"  So if someone attempts to log arguments that json.dumps() can't handle, something will be written to the logfile even if it's just something like "<__main__.Foo object at 0x7f8fe7621da0>"


Here is a concrete example.  The JSON logging formatter is initialized with the following field name translation table:

TRANSLATE_FIELD_NAMES = {
    "asctime": "@timestamp",
    "levelname": "priority",
    "message": None,
    "process": "pid",
}

The above dictionary means that asctime, levelname, message, and process will be written to the output file, in that order.  The output JSON object will have key names of, respectively: "@timestamp", priority, message (no change), and pid.

Then the following logging call occurs:

 log.info("connecting to server", {"url": server_url, "port": sever_port})



The output written to the log file would be similar to this:

{"@timestamp":"2018-05-14T17:28:04.112-04:00","priority":"INFO","message":"connecting to server","pid":32073,"url":"http://127.0.0.1","port":8043}


As noted above, I have already written this code, and it has been in production for months at the company where I work.  However, the code can be simplified and cleaned up a bit, since it contains Python 2.7 backward compatibility code.

----------
components: Library (Lib)
messages: 316597
nosy: steveha, vinay.sajip
priority: normal
severity: normal
status: open
title: [logging] add JSON log formatter
type: enhancement
versions: Python 3.8

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue33510>
_______________________________________


More information about the New-bugs-announce mailing list