unicode bit me
Scott David Daniels
Scott.Daniels at Acm.Org
Sat May 9 13:39:50 EDT 2009
rurpy at yahoo.com wrote:
> On May 9, 10:08 am, Steven D'Aprano <st... at REMOVE-THIS-
> cybersource.com.au> wrote:
>> On Sat, 09 May 2009 08:37:59 -0700, anuraguni... at yahoo.com wrote:
>>> Sorry being unclear again, hmm I am becoming an expert in it.
>>> I pasted that code as continuation of my old code at start i.e
>>> class A(object):
>>> def __unicode__(self):
>>> return u"©au"
>>> def __repr__(self):
>>> return unicode(self).encode("utf-8")
>>> __str__ = __repr__
>>> doesn't work means throws unicode error my question
>> What unicode error?
>>
>> Stop asking us to GUESS what the error is, and please copy and paste the
>> ENTIRE TRACEBACK that you get. When you ask for free help, make it easy
>> for the people trying to help you. If you expect them to copy and paste
>> your code and run it just to answer the smallest questions, most of them
>> won't bother.
> It took me less then 45 seconds to open a terminal window, start
> Python, and paste the OPs code to get:
>>>> class A(object):
> ... def __unicode__(self):
> ... return u"©au"
> ... def __repr__(self):
> ... return unicode(self).encode("utf-8")
> ... __str__ = __repr__
> ...
>>>> print unicode(a)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> NameError: name 'a' is not defined
>>>> a=A()
>>>> print unicode(a)
> ©au
>>>> print unicode([a])
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
> 1: ordinal not in range(128)
>
> Which is the same error he had already posted!
It is _not_clear_ that is what was going on.
Your 45 seconds could have been his 45 seconds.
He was describing results rather than showing them.
From your demo, I get to:
unicode(u'\N{COPYRIGHT SIGN}au'.encode('utf-8'))
raises an exception (which it should).
unicode(u'\N{COPYRIGHT SIGN}au'.encode('utf-8'), 'utf-8')
Does _not_ raise an exception (as it should not).
Note that his __repr__ produces characters which are not ASCII.
So, str or repr of a list containing those elements will also
be non-ascii. To convert non-ASCII strings to unicode, you must
specify a character encoding.
The object a (created with A()) can be converted directly to
unicode (via its unicode method). No problem.
The object A() may have its repr taken, which is a (non-unicode)
string which is not ASCII. But you cannot take unicode(repr(a)),
because repr(a) contains a character > '\x7f'.
What he was trying to do was masking the issue. Imagine:
class B(object):
def __unicode__(self):
return u'one'
def __repr__(self):
return 'two'
def __str__(self):
return 'three'
b = B()
print b, unicode(b), [b]
By the way, pasting code with non-ASCII characters does not mean
your recipient will get the characters you pasted.
--Scott David Daniels
Scott.Daniels at Acm.Org
More information about the Python-list
mailing list