unicode question

Mon Nov 22 18:21:08 EST 2004

wolfgang haefelinger wrote:
> Neverthelss, I regard
> 
>  print y.__str__()            ## works
>  print y                           ## fails??
> 
> as a very inconsistent behaviour.

Notice that this also fails

x=str(y)

So it is really the string conversion that fails. Roughly the same
happens with

class X:
   def __str__(self):
     return -1

Here, instances of X also cannot be printed: str() is really supposed
to return a byte string object - not a number, not a unicode object.
As a special exception, __str__ can return a Unicode object, as long
as that result can be converted with the system default encoding into
a byte string object. So we really have

def str(o):
   if isinstance(o, types.StringType): return o
   if isinstance(o, types.UnicodeType): return o.encode(None)
   return str(o.__str__())

This is why the first print succeeds (it calls __str__ directly,
printing the Unicode object afterwards), and the second print fails
(trying to str()-convert its argument, which already fails - it
  didn't get so far as to actually trying to print something).

> Somehow I have the feeling that Python should give up the distinction
> between unicode  and  str  and just have a str type which is internally
> unicode.

Yes, that should happen in P3k. But even then, there will be a
distinction between byte (plain) strings, and character (unicode)
strings.

Regards,
Martin