[Python-Dev] unicode and __str__

"Martin v. Löwis" martin at v.loewis.de
Tue Aug 31 07:09:40 CEST 2004


Neil Schemenauer wrote:
> Forgive me if I'm being obtuse, but I'm trying to understand the
> overall Python unicode design.  This works:
> 
>     >>> sys.getdefaultencoding()
>     'utf-8'
>     >>> str(A())
>     '\xe1\x88\xb4'

Ah, ok, so you have changed sys.getdefaultencoding on your
installation. Doing so means that some programs will only
run on your installation, but not on others (e.g. mine).
One shouldn't change the default encoding away from ASCII
except to work around buggy applications which would fail
because of their unicode-unawareness.

> Can you be more specific about what is incorrect with the above
> class?

In the default installation, it gives a UnicodeEncodeError.

>>No. In some cases, str() needs to compromise, where unicode()
>>doesn't.
> 
> 
> Sorry, I don't understand that statement.  Are you saying that we
> will eventually get rid of __str__ and only have __unicode__?

No. Eventually, when strings are Unicode objects, the string
conversion function will return such a thing. Whether this will
be called __str__, __unicode__, or __string__, I don't know.
However, this won't happen until Python 3, and it is not clear
to me how it will look. We may also need a conversion routine
into byte strings.

> If only we could. :-)  Seriously though, I'm trying to understand
> the point of __unicode__.  To me it seems to make the transition to
> unicode string needlessly more complicated.

Why do you say that? You don't *have* to implement __unicode__
if you don't need it - just like as you don't have to implement
__len__ or __nonzero__: If your class is fine with the standard
"non-None is false", implement neither. If your conceptually
have a sequence type, implement __len__ for "empty is false".
If you have a more different class, implement __nonzero__ for
"I decide what false is".

Likewise, if you are happy with the standard '<Foo instance>',
implement neither __str__ nor __unicode__. If your class has
a canonical byte string representation, implement __str__. If
this byte string representation is not meaningful ASCII, and
if a more meaningful string representation using other Unicode
characters would be possible, also implement __unicode__. Never
rely on the default encoding being something other than ASCII,
though. Eventually, when strings are Unicode objects, you
won't be able to change it.

Regards,
Martin


More information about the Python-Dev mailing list