unicode bit me

Sat May 9 13:39:50 EDT 2009

rurpy at yahoo.com wrote:
> On May 9, 10:08 am, Steven D'Aprano <st... at REMOVE-THIS-
> cybersource.com.au> wrote:
>> On Sat, 09 May 2009 08:37:59 -0700, anuraguni... at yahoo.com wrote:
>>> Sorry being unclear again, hmm I am becoming an expert in it.
>>> I pasted that code as continuation of my old code at start i.e
>>>  class A(object):
>>>      def __unicode__(self):
>>>          return u"©au"
>>>      def __repr__(self):
>>>          return unicode(self).encode("utf-8")
>>>      __str__ = __repr__
>>> doesn't work means throws unicode error my question
>> What unicode error?
>>
>> Stop asking us to GUESS what the error is, and please copy and paste the
>> ENTIRE TRACEBACK that you get. When you ask for free help, make it easy
>> for the people trying to help you. If you expect them to copy and paste
>> your code and run it just to answer the smallest questions, most of them
>> won't bother.

> It took me less then 45 seconds to open a terminal window, start
> Python, and paste the OPs code to get:
>>>> class A(object):
> ...      def __unicode__(self):
> ...          return u"©au"
> ...      def __repr__(self):
> ...          return unicode(self).encode("utf-8")
> ...      __str__ = __repr__
> ...
>>>> print unicode(a)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> NameError: name 'a' is not defined
>>>> a=A()
>>>> print unicode(a)
> ©au
>>>> print unicode([a])
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
> 1: ordinal not in range(128)
> 
> Which is the same error he had already posted!

It is _not_clear_ that is what was going on.
Your 45 seconds could have been his 45 seconds.
He was describing results rather than showing them.

 From your demo, I get to:
    unicode(u'\N{COPYRIGHT SIGN}au'.encode('utf-8'))
raises an exception (which it should).
    unicode(u'\N{COPYRIGHT SIGN}au'.encode('utf-8'), 'utf-8')
Does _not_ raise an exception (as it should not).
Note that his __repr__ produces characters which are not ASCII.
So, str or repr of a list containing those elements will also
be non-ascii.  To convert non-ASCII strings to unicode, you must
specify a character encoding.

The object a (created with A()) can be converted directly to
unicode (via its unicode method).  No problem.
The object A() may have its repr taken, which is a (non-unicode)
string which is not ASCII.  But you cannot take unicode(repr(a)),
because repr(a) contains a character > '\x7f'.
What he was trying to do was masking the issue.  Imagine:

     class B(object):
         def __unicode__(self):
             return u'one'
         def __repr__(self):
             return 'two'
         def __str__(self):
             return 'three'

     b = B()
     print b, unicode(b), [b]

By the way, pasting code with non-ASCII characters does not mean
your recipient will get the characters you pasted.

--Scott David Daniels
Scott.Daniels at Acm.Org