logging of strings with broken encoding

Lie Ryan lie.1296 at gmail.com
Thu Jul 2 13:14:34 EDT 2009


Thomas Guettler wrote:
> My quick fix is this:
> 
> class MyFormatter(logging.Formatter):
>     def format(self, record):
>         msg=logging.Formatter.format(self, record)
>         if isinstance(msg, str):
>             msg=msg.decode('utf8', 'replace')
>         return msg
> 
> But I still think handling of non-ascii byte strings should be better.
> A broken logging message is better than none.
> 

The problem is, python 2.x assumed the default encoding of `ascii`
whenever you don't explicitly mention the encoding, and your code
apparently broke with that assumption. I haven't looked at your code,
but others have suggested that you've fed the logging module with
non-ascii byte strings. The logging module can only work with 1) unicode
string, 2) ascii-encoded byte string

If you want a quick fix, you may be able to get away with repr()-ing
your log texts. A proper fix, however, is to pass a unicode string to
the logging module instead.

>>> logging.warn('ы') # or logging.warn('\xd1\x8b')
Traceback (most recent call last):
  File "/usr/lib64/python2.6/logging/__init__.py", line 773, in emit
    stream.write(fs % msg.encode("UTF-8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 13:
ordinal not in range(128)
>>> logging.warn(repr('ы'))
WARNING:root:'\xd1\x8b'
>>> logging.warn(u'ы')
WARNING:root:ы



More information about the Python-list mailing list