[Python-Dev] A proposal: configuring logging using dictionaries

Vinay Sajip vinay_sajip at yahoo.co.uk
Sat Oct 17 09:23:50 CEST 2009


A little while ago, I posted here a suggestion about a new way to configure
logging, using dictionaries. This received some positive and no negative
feedback, so I have thought some more about the details of how it might work. I
present below the results of that thinking, in a PEP-style format. I don't know
if an actual PEP is required for a change of this type, but I felt that it's
still worth going through the exercise to try and achieve a reasonable level of
rigour. (I hope I've succeeded.)

I would welcome all your feedback on this proposal. If I hear no negative
feedback, I propose to implement this feature as suggested.

I thought about posting this on comp.lang.python as well, but possibly it's a
little too much information for most of the folks there. I think it would be
useful to get feedback from the wider community, though, and welcome any
suggestions on how best to achieve this.

Thanks and regards,

Vinay Sajip
-----------
PEP: XXX
Title: Dictionary-Based Configuration For Logging
Version: $Revision$
Last-Modified: $Date$
Author: Vinay Sajip <vinay_sajip at red-dove.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 15-Oct-2009
Python-Version: 2.7 and 3.2
Post-History:


Abstract
========

This PEP describes a new way of configuring logging using a dictionary to hold
configuration information.

Rationale
=========

The present means for configuring Python's logging package is either by using
the logging API to configure logging programmatically, or else by means of
ConfigParser-based configuration files.

Programmatic configuration, while offering maximal control, fixes the
configuration in Python code.  This does not facilitate changing it easily at
runtime, and, as a result, the ability to flexibly turn the verbosity of
logging up and down for different parts of a using application is lost.  This
limits the usability of logging as an aid to diagnosing problems - and
sometimes, logging is the only diagnostic aid available in production
environments.

The ConfigParser-based configuration system is usable, but does not allow its
users to configure all aspects of the logging package.  For example, Filters
cannot be configured using this system.  Furthermore, the ConfigParser format
appears to engender dislike (sometimes strong dislike) in some quarters.
Though it was chosen because it was the only configuration format supported in
the Python standard at that time, many people regard it (or perhaps just the
particular schema chosen for logging's configuration) as 'crufty' or 'ugly',
in some cases apparently on purely aesthetic grounds.

Recent versions of Python inlude JSON support in the standard library, and
this is also usable as a configuration format.  In other environments, such as
Google App Engine, YAML is used to configure applications, and usually the
configuration of logging would be considered an integral part of the
application configuration.  Although the standard library does not contain
YAML support at present, support for both JSON and YAML can be provided in a
common way because both of these serialization formats allow deserialization
of Python dictionaries.

By providing a way to configure logging by passing the configuration in a
dictionary, logging will be easier to configure not only for users of JSON
and/or YAML, but also for users of bespoke configuration methods, by providing
a common format in which to describe the desired configuration.

Another drawback of the current ConfigParser-based configuration system is
that it does not support incremental configuration: a new configuration
completely replaces the existing configuration.  Although full flexibility for
incremental configuration is difficult to provide in a multi-threaded
environment, the new configuration mechanism will allow the provision of
limited support for incremental configuration.

Specification
=============

The specification consists of two parts: the API and the format of the
dictionary used to convey configuration information (i.e. the schema to which
it must conform).

Naming
------

Historically, the logging package has not been PEP-8 conformant.  At some
future time, this will be corrected by changing method and function names in
the package in order to conform with PEP-8.  However, in the interests of
uniformity, the proposed additions to the API use a naming scheme which is
consistent with the present scheme used by logging.

API
---

The logging.config module will have the following additions:

* A class, called ``DictConfigurator``, whose constructor is passed the
  dictionary used for configuration, and which has a ``configure()`` method.

* A callable, called ``dictConfigClass``, which will (by default) be set to
  ``DictConfigurator``.  This is provided so that if desired,
  ``DictConfigurator`` can be replaced with a suitable user-defined
  implementation.

* A function, called ``dictConfig()``, which takes a single argument - the
  dictionary holding the configuration.  This function will call
  ``dictConfigClass`` passing the specified dictionary, and then call the
  ``configure()`` method on the returned object to actually put the 
  configuration into effect::
  
    def dictConfig(config):
        dictConfigClass(config).configure()

Dictionary Schema - Overview
----------------------------

Before describing the schema in detail, it is worth saying a few words about
object connections, support for user-defined objects and access to external
objects.

Object connections
''''''''''''''''''

The schema is intended to describe a set of logging objects - loggers,
handlers, formatters, filters - which are connected to each other in an
object graph.  Thus, the schema needs to represent connections between the
objects.  For example, say that, once configured, a particular logger has an
attached to it a particular handler.  For the purposes of this discussion,
we can say that the logger represents the source, and the handler the
destination, of a connection between the two.  Of course in the configured
objects this is represented by the logger holding a reference to the
handler.  In the configuration dict, this is done by giving each destination
object an id which identifies it unambiguously, and then using the id in the
source object's configuration to indicate that a connection exists between
the source and the destination object with that id.
  
So, for example, consider the following YAML snippet::
  
  handers:
    h1: #This is an id
     # configuration of handler with id h1 goes here
    h2: #This is another id
     # configuration of handler with id h2 goes here
  loggers:
    foo.bar.baz:
      # other configuration for logger "foo.bar.baz"
      handlers: [h1, h2]

(Note: YAML will be used in this document as it is more readable than the
equivalent Python source form for the dictionary.)
  
The ids for loggers are the logger names which would be used
programmatically to obtain a reference to those loggers, e.g.
``foo.bar.baz``.  The ids for other objects can be any string value (such as
``h1``, ``h2`` above) and they are transient, in that they are only
meaningful for processing the configuration dictionary and used to
determine connections between objects, and are not persisted anywhere when
the configuration call is complete.
  
The above snippet indicates that logger named ``foo.bar.baz`` should have
two handlers attached to it, which are described by the handler ids ``h1``
and ``h2``.

User-defined objects
''''''''''''''''''''

The schema should support user-defined objects for handlers, filters and
formatters.  (Loggers do not need to have different types for different
instances, so there is no support - in the configuration - for user-defined
logger classes.)

Objects to be configured will typically be described by dictionaries which
detail their configuration.  In some places, the logging system will be able
to infer from the context how an object is to be instantiated, but when a
user-defined object is to be instantiated, the system will not know how to do
this.  In order to provide complete flexibility for user-defined object
instantiation, the user will need to provide a 'factory' - a callable which
is called with a configuration dictionary and which returns the instantiated
object.  This will be signalled by the factory being made available under
the special key ``'()'``.  Here's a concrete example::
  
  formatters:
    brief:
      format: '%(message)s'
    default:
      format: '%(asctime)s %(levelname)-8s %(name)-15s %(message)s'
      datefmt: '%Y-%m-%d %H:%M:%S'
    custom:
        (): my.package.customFormatterFactory
        bar: baz
        spam: 99.9
        answer: 42

The above YAML snippet defines three formatters.  The first, with id
``brief``, is a standard ``logging.Formatter`` instance with the
specified format string.  The second, with id ``default``, has a longer
format and also defines the time format explicitly, and will result in a
``logging.Formatter`` initialized with those two format strings.  Shown in
Python source form, the ``brief`` and ``default`` formatters have
have configuration sub-dictionaries::
  
  {
    'format' : '%(message)s'
  }
    
and::
  
  {
    'format' : '%(asctime)s %(levelname)-8s %(name)-15s %(message)s',
    'datefmt' : '%Y-%m-%d %H:%M:%S'
  }
  
respectively, and as these dictionaries do not contain the special key
``'()'``, the instantiation is inferred from the context: as a result,
standard ``logging.Formatter`` instances are created.  The configuration
sub-dictionary for the third formatter, with id ``custom``, is::

  {
    '()' : 'my.package.customFormatterFactory',
    'bar' : 'baz',
    'spam' : 99.9,
    'answer' : 42
  }
  
and this contains the special key ``'()'``, which means that user-defined
instantiation is wanted.  In this case, the specified factory callable will be
located using normal import mechanisms and called with the *remaining* items
in the configuration sub-dictionary as keyword arguments.  In the above
example, the formatter with id ``custom`` will be assumed to be returned by
the call::

  my.package.customFormatterFactory(bar="baz", spam=99.9, answer=42)  

The key ``'()'`` has been used as the special key because it is not a valid
keyword parameter name, and so will not clash with the names of the keyword
arguments used in the call.  The ``'()'`` also serves as a mnemonic that the
corresponding value is a callable.

Access to external objects
''''''''''''''''''''''''''

There are times where a configuration will need to refer to objects external
to the configuration, for example ``sys.stderr``.  If the configuration dict
is constructed using Python code then this is straightforward, but a problem
arises when the configuration is provided via a text file (e.g. JSON, YAML).
In a text file, there is no standard way to distinguish ``sys.stderr`` from
the literal string ``'sys.stderr'``.  To facilitate this distinction, the
configuration system will look for certain special prefixes in string values
and treat them specially. For example, if the literal string
``'ext://sys.stderr'`` is provided as a value in the configuration, then the
``ext://`` will be stripped off and the remainder of the value processed using
normal import mechanisms.

The handling of such prefixes will be done in a way analogous to protocol
handling: there will be a generic mechanism to look for prefixes which match
the regular expression ``^(?P<prefix>[a-z]+)://(?P<suffix>.*)$`` whereby, if
the ``prefix`` is recognised, the ``suffix`` is processed in a prefix-
dependent manner and the result of the processing replaces the string value.
If the prefix is not recognised, then the string value will be left as-is.

The implementation will provide for a set of standard prefixes such as
``ext://`` but it will be possible to disable the mechanism completely or
provide additional or different prefixes for special handling.

Dictionary Schema - Detail
--------------------------

The dictionary passed to ``dictConfig()`` must contain the following keys:

* `version` - to be set to an integer value representing the schema
  version.  The only valid value at present is 1, but having this key allows
  the schema to evolve while still preserving backwards compatibility.

All other keys are optional, but if present they will be interpreted as described
below.  In all cases below where a 'configuring dict' is mentioned, it will be
checked for the special ``'()'`` key to see if a custom instantiation is
required.  If so, the mechanism described above is used to instantiate;
otherwise, the context is used to determine how to instantiate.

* `formatters` - the corresponding value will be a dict in which each key is
  a formatter id and each value is a dict describing how to configure the
  corresponding Formatter instance.
  
  The configuring dict is searched for keys ``format`` and ``datefmt`` (with
  defaults of ``None``) and these are used to construct a
  ``logging.Formatter`` instance.

* `filters` - the corresponding value will be a dict in which each key is
  a filter id and each value is a dict describing how to configure the
  corresponding Filter instance.

  The configuring dict is searched for key ``name`` (defaulting to the empty
  string) and this is used to construct a ``logging.Filter`` instance.

* `handlers` - the corresponding value will be a dict in which each key is
  a handler id and each value is a dict describing how to configure the
  corresponding Handler instance.

  The configuring dict is searched for the following keys:
  
  * ``class`` (mandatory).  This is the fully qualified name of the handler
    class.
    
  * ``level`` (optional).  The level of the handler.
  
  * ``formatter`` (optional).  The id of the formatter for this handler.
  
  * ``filters`` (optional).  A list of ids of the filters for this handler.

  All *other* keys are passed through as keyword arguments to the handler's
  constructor.  For example, given the snippet::

    handlers:
      console:
        class : logging.StreamHandler
        formatter: brief
        level   : INFO
        filters: [allow_foo]
        stream  : ext://sys.stdout
      file:
        class : logging.handlers.RotatingFileHandler
        formatter: precise
        filename: logconfig.log
        maxBytes: 1024
        backupCount: 3

  the handler with id ``console`` is instantiated as a
  ``logging.StreamHandler``, using ``sys.stdout`` as the underlying stream.
  The handler with id ``file`` is instantiated as a
  ``logging.handlers.RotatingFileHandler`` with the keyword arguments
  ``filename="logconfig.log", maxBytes=1024, backupCount=3``.

* `loggers` - the corresponding value will be a dict in which each key is
  a logger name and each value is a dict describing how to configure the
  corresponding Logger instance.

  The configuring dict is searched for the following keys:
  
  * ``level`` (optional).  The level of the logger.
  
  * ``propagate`` (optional).  The propagation setting of the logger.
  
  * ``filters`` (optional).  A list of ids of the filters for this logger.

  * ``handlers`` (optional).  A list of ids of the handlers for this logger.

  The specified loggers will be configured according to the level,
  propagation, filters and handlers specified.

* `root` - this will be the configuration for the root logger. Processing of
  the configuration will be as for any logger, except that the ``propagate``
  setting will not be applicable.
  
* `incremental` - whether the configuration is to be interpreted as
  incremental to the existing configuration. This value defaults to False,
  which means that the specified configuration replaces the existing
  configuration with the same semantics as used by the existing
  ``fileConfig()`` API.
  
  If the specified value is True, the configuration is processed as described
  in the section on "Incremental Configuration", below.

A Working Example
-----------------

The following is an actual working configuration in YAML format (except that
the email addresses are bogus)::

    formatters:
      brief:
        format: '%(levelname)-8s: %(name)-15s: %(message)s'
      precise:
        format: '%(asctime)s %(name)-15s %(levelname)-8s %(message)s'
    filters:
      allow_foo:
        name: foo
    handlers:
      console:
        class : logging.StreamHandler
        formatter: brief
        level   : INFO
        stream  : ext://sys.stdout
        filters: [allow_foo]
      file:
        class : logging.handlers.RotatingFileHandler
        formatter: precise
        filename: logconfig.log
        maxBytes: 1024
        backupCount: 3
      debugfile:
        class : logging.FileHandler
        formatter: precise
        filename: logconfig-detail.log
        mode: a
      email:
        class: logging.handlers.SMTPHandler
        mailhost: localhost
        fromaddr: my_app at domain.tld
        toaddrs:
          - support_team at domain.tld
          - dev_team at domain.tld
        subject: Houston, we have a problem.
    loggers:
      foo:
        level : ERROR
        handlers: [debugfile]
      spam:
        level : CRITICAL
        handlers: [debugfile]
        propagate: no
      bar.baz:
        level: WARNING
    root:
      level     : DEBUG
      handlers  : [console, file]

Incremental Configuration
=========================

It is difficult to provide complete flexibility for incremental configuration.
For example, because objects such as handlers, filters and formatters are
anonymous, once a configuration is set up, it is not possible to refer to such
anonymous objects when augmenting a configuration. For example, if an initial
call is made to configure the system where logger ``foo`` has a handler with
id ``console`` attached, then a subsequent call to configure a logger ``bar``
with id ``console`` would create a new handler instance, as the id ``console``
from the first call isn't kept.

Furthermore, there is not a compelling case for arbitrarily altering the
object graph of loggers, handlers, filters, formatters at run-time, once a 
configuration is set up; the verbosity of loggers can be controlled just by
setting levels (and perhaps propagation flags).

Thus, when the ``incremental`` key of a configuration dict is present and
is ``True``, the system will ignore the ``formatters``, ``filters``,
``handlers`` entries completely, and process only the ``level`` and
``propagate`` settings in the ``loggers`` and ``root`` entries.

Configuration Errors
====================

If an error is encountered during configuration, the system will raise a
``ValueError`` or a ``TypeError`` with a suitably descriptive message. The
following is a (possibly incomplete) list of conditions which will raise an
error:

* A ``level`` which is not a string or which is a string not corresponding to
  an actual logging level

* A ``propagate`` value which is not a Boolean

* An id which does not have a corresponding destination

* An invalid logger name

Copyright
=========

This document has been placed in the public domain.



..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End:





More information about the Python-Dev mailing list