UnicodeEncodeError during repr()

gb345 gb345 at invalid.com
Mon Apr 19 03:46:46 CEST 2010




I'm getting a UnicodeEncodeError during a call to repr:

Traceback (most recent call last):
  File "bug.py", line 142, in <module>
    element = parser.parse(INPUT)
  File "bug.py", line 136, in parse 
    ps = Parser.Parse(open(filename,'r').read(), 1)
  File "bug.py", line 97, in end_item 
    r = repr(CURRENT_ENTRY)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u3003' in position 0: o\
rdinal not in range(128)

This is what CURRENT_ENTRY.__repr__ looks like:

   def __repr__(self):
        k = SEP.join(self.k)
        r = SEP.join(self.r)
        s = SEP.join(self.s)
        ret = u'\t'.join((k, r, s))
        print type(ret)  # prints "<type 'unicode'>", as expected
	return ret

If I "inline" this CURRENT_ENTRY.__repr__ code so that the call to
repr(CURRENT_ENTRY) can be bypassed altogether, then the error
disappears.

Therefore, it is clear from the above that the problem, whatever
it is, occurs during the execution of the repr() built-in *after*
it gets the value returned by CURRENT_ENTRY.__repr__.  It is also
clearly that repr is trying to encode something using the ascii
codec, but I don't understand why it needs to encode anything.

Do I need to do something especial to get repr to work strictly
with unicode?

Or should __repr__ *always* return bytes rather than unicode?  What
about __str__ ?  If both of these are supposed to return bytes,
then what method should I use to define the unicode representation
for instances of a class?

Thanks!

Gabe



More information about the Python-list mailing list