[ python-Bugs-1314107 ] Issue in unicode args in logging

SourceForge.net noreply at sourceforge.net
Fri Oct 7 01:16:56 CEST 2005


Bugs item #1314107, was opened at 2005-10-05 11:11
Message generated for change (Settings changed) made by tungwaiyip
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1314107&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Unicode
Group: Python 2.4
>Status: Open
Resolution: Fixed
Priority: 5
Submitted By: Wai Yip Tung (tungwaiyip)
Assigned to: Vinay Sajip (vsajip)
Summary: Issue in unicode args in logging 

Initial Comment:
logging has an issue in handling unicode object 
arguments.

>>> import logging
>>>
>>> class Obj:
...     def __init__(self,name):
...         self.name = name
...     def __str__(self):
...         return self.name
...
>>> # a non-ascii string
...
>>> obj = Obj(u'\u00f6')
>>>
>>> # this will cause error
...
>>> print '%s' % obj
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode 
character u'\xf6' in position 0: ordinal not in range(128)
>>>
>>> # this will promote to unicode (and the console also 
happen to be able to display it)
...
>>> print u'%s' % obj
ö
>>>
>>> # this works fine
... # (other than logging makes its own decision to 
encode in utf8)
...
>>> logging.error(u'%s' % obj)
ERROR:root:&#9500;&#9570;
>>>
>>> # THIS IS AN UNEXPECTED PROBLEM!!!
...
>>> logging.error(u'%s', obj)
Traceback (most recent call last):
  File "C:\Python24\lib\logging\__init__.py", line 706, in 
emit
    msg = self.format(record)
  File "C:\Python24\lib\logging\__init__.py", line 592, in 
format
    return fmt.format(record)
  File "C:\Python24\lib\logging\__init__.py", line 382, in 
format
    record.message = record.getMessage()
  File "C:\Python24\lib\logging\__init__.py", line 253, in 
getMessage
    msg = msg % self.args
UnicodeEncodeError: 'ascii' codec can't encode 
character u'\xf6' in position 0: ordinal not in range(128)
>>>
>>> # workaround the str() conversion in getMessage()
...
>>> logging.error(u'%s-\u00f6', obj)
ERROR:root:&#9500;&#9570;-&#9500;&#9570;


The issue seems to be in LogRecord.getMessage(). It 
attempts to convert msg to byte string:

   msg = str(self.msg)

I am not sure why ti want to do the conversion. The last 
example workaround this by making sure msg is not 
convertible to byte string.


----------------------------------------------------------------------

>Comment By: Wai Yip Tung (tungwaiyip)
Date: 2005-10-06 16:16

Message:
Logged In: YES 
user_id=561546

>>To ensure good Unicode support, ensure your messages 
are either Unicode strings or objects whose __str__() method
returns a Unicode string. Then, 

>>msg = msg % args

That's what I am doing already. 

Let me explain the subtle problem again.

1. print '%s' % obj - error
2. logging.error(u'%s' % obj) - ok
3. logging.error(u'%s', obj) - error
4. logging.error(u'%s-\u00f6', obj) -ok

I can understand how 1 fails. But I expect 2,3 and 4 to work 
similarly. Especially contrast 3 with 4. 4 work when 3 doesn't 
because when str() is applied to u'%s-\u00f6' it fails and it 
fallbacks to the original unicode string, which is the correct 
way in my opinion. Whereas in 3, the u'%s' get demoted to 
byte string '%s' so it fails like 1.

----------------------------------------------------------------------

Comment By: Vinay Sajip (vsajip)
Date: 2005-10-06 01:44

Message:
Logged In: YES 
user_id=308438

Misc. changes were backported into Python 2.4.2, please
check that you have this version.

The problem is not with

msg = str(self.msg)

but rather with

msg = msg % args

To ensure good Unicode support, ensure your messages are
either Unicode strings or objects whose __str__() method
returns a Unicode string. Then, 

msg = msg % args

should result in a Unicode object. You can pass this to a
FileHandler opened with an encoding argument, or a
StreamHandler whose stream has been opened using
codecs.open(). Ensure your default encoding is set correctly
using sitecustomize.py.

The encoding additions were made in Revision 1.26 of
logging/__init__.py, dated 13/03/2005.

Marking as closed.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2005-10-05 21:00

Message:
Logged In: YES 
user_id=33168

Vinay, any suggestions?

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2005-10-05 13:47

Message:
Logged In: YES 
user_id=38388

Unassinging the bug. I don't know anything about the logging
module.

Hint: perhaps the logging module should grow an .encoding
attribute which then allows converting Unicode to some
encoding used in the log file ?!

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1314107&group_id=5470


More information about the Python-bugs-list mailing list