unicode bit me
Nick Craig-Wood
nick at craig-wood.com
Sun May 10 03:30:05 EDT 2009
anuraguniyal at yahoo.com <anuraguniyal at yahoo.com> wrote:
> First of all thanks everybody for putting time with my confusing post
> and I apologize for not being clear after so many efforts.
>
> here is my last try (you are free to ignore my request for free
> advice)
>
> # -*- coding: utf-8 -*-
>
> class A(object):
>
> def __unicode__(self):
> return u"©au"
>
> def __repr__(self):
> return unicode(self).encode("utf-8")
>
> __str__ = __repr__
>
> a = A()
> u1 = unicode(a)
> u2 = unicode([a])
>
> now I am not using print so that doesn't matter stdout can print
> unicode or not
> my naive question is line u2 = unicode([a]) throws
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
> 1: ordinal not in range(128)
>
> shouldn't list class call unicode on its elements?
You mean when you call unicode(a_list) it should unicode() on each of
the elements to build the resultq?
Yes that does seem sensible, however list doesn't have a __unicode__
method at all so I guess it is falling back to using __str__ on each
element, and which explains your problem exactly.
If you try your example on python 3 then you don't need the
__unicode__ method at all (all strings are unicode) and you won't have
the problem I predict. (I haven't got a python 3 in front of me at the
moment to test.)
So I doubt you'll find the momentum to fix this since unicode and str
integration was the main focus of python 3, but you could report a
bug. If you attach a patch to fix it - so much the better!
Here is my demonstration of the problem with python 2.5.2
>> class A(object):
... def __unicode__(self):
... return u"\N{COPYRIGHT SIGN}au"
... def __repr__(self):
... return unicode(self).encode("utf-8")
... __str__ = __repr__
...
>>> a = A()
>>> str(a)
'\xc2\xa9au'
>>> repr(a)
'\xc2\xa9au'
>>> unicode(a)
u'\xa9au'
>>> L=[a]
>>> str(L)
'[\xc2\xa9au]'
>>> repr(L)
'[\xc2\xa9au]'
>>> unicode(L)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
1: ordinal not in range(128)
>>> unicode('[\xc2\xa9au]')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
1: ordinal not in range(128)
>>> L.__unicode__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'list' object has no attribute '__unicode__'
>>> unicode(str(L),"utf-8")
u'[\xa9au]'
--
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick
More information about the Python-list
mailing list