Fri Apr 6 22:06:22 CEST 2012

There is a problem with the way logging.handlers.SysLogHandler works
when presented with Unicode messages. According to RFC 5424, Unicode
is supposed to be sent encoded as UTF-8 and preceded by a BOM.
However, the current handler implementation puts the BOM at the start
of the formatted message, and this is wrong in scenarios where you
want to put some additional structured data in front of the
unstructured message part; the BOM is supposed to go after the
structured part (which, therefore, has to be ASCII) and before the
unstructured part. In that scenario, the handler's current behaviour
does not strictly conform to RFC 5424.

The issue is described in [1]. The BOM was originally added / position
changed in response to [2] and [3].

It is not possible to achieve conformance with the current
implementation of the handler, unless you subclass the handler and
override the whole emit() method. This is not ideal. For 3.3, I will
refactor the implementation to expose a method which creates the byte
string which is sent over the wire to the syslog daemon. This method
can then be overridden for specific use cases where needed.

However, for 2.7 and 3.2, removing the BOM insertion would bring the
implementation into conformance to the RFC, though the entire message
would have to be regarded as just a set of octets. A Unicode message
would still be encoded using UTF-8, but the BOM would be left out.

I am thinking of removing the BOM insertion in 2.7 and 3.2 - although
it is a change in behaviour, the current behaviour does seem broken
with regard to RFC 5424 conformance. However, as some might disagree
with that assessment and view it as a backwards-incompatible behaviour
change, I thought I should post this to get some opinions about
whether this change is viewed as objectionable.


[1] http://bugs.python.org/issue14452
[2] http://bugs.python.org/issue7077
[3] http://bugs.python.org/issue8795

