unicode bit me

Piet van Oostrum piet at cs.uu.nl
Fri May 8 17:22:26 EDT 2009


>>>>> "J. Cliff Dyer" <jcd at sdf.lonestar.org> (JCD) a écrit:

>JCD> On Fri, 2009-05-08 at 07:53 -0700, anuraguniyal at yahoo.com wrote:
>>> #how can I print a list of object which may return unicode
>>> representation?
>>> # -*- coding: utf-8 -*-
>>> 
>>> class A(object):
>>> 
>>> def __unicode__(self):
>>> return u"©au"
>>> 
>>> __str__ = __repr__ = __unicode__
>>> 

>JCD> Your __str__ and __repr__ methods don't return strings.  You should
>JCD> encode your unicode to the encoding you want before you try to print it.

>JCD> class A(object):
>JCD>     def __unicode__(self):
>JCD>         return u"©au"

>JCD>     def get_utf8_repr(self):
>JCD>         return self.__unicode__().encode('utf-8')

>JCD>     def get_koi8_repr(self):
>JCD>         return self.__unicode__().encode('koi-8')

>JCD>     __str__ = __repr__ = self.get_utf8_repr

It might be nicer to have a method that specifies the encoding to be
used in order to make switching encodings easier:

*untested code*

class A(object):
    def __unicode__(self):
        return u"©au"

    def set_encoding(self, encoding):
        self._encoding = encoding

    def __repr__(self):
        return self.__unicode__().encode(self._encoding)

    __str__ = __repr__

Of course this feels very wrong because the encoding should be chosen when
the string goes to the output channel, i.e. outside of the object.
Unfortunately this is one of the leftovers from Python's pre-unicode
heritage. Hopefully in Python3 this will work without problems. Anyway,
in Python 3 the string type is unicode, so at least __repr__ can return
unicode. 
-- 
Piet van Oostrum <piet at cs.uu.nl>
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: piet at vanoostrum.org



More information about the Python-list mailing list