[Python-Dev] unicode and __str__
Tim Peters
tim.peters at gmail.com
Mon Aug 30 22:41:10 CEST 2004
[Neil Schemenauer]
> ...
> The only thing I found in the NEWS file that seemed relevant is
> this note:
>
> u'%s' % obj will now try obj.__unicode__() first and fallback to
> obj.__str__() if no __unicode__ method can be found.
>
> I don't think that describes the behavior difference. Allowing
> __str__ return unicode strings seems like a pretty noteworthy
> change (assuming that's what actually happened).
It's confusing. A __str__ method or tp_str type slot can return
unicode, but what happens after that depends on the caller.
PyObject_Str() and PyObject_Repr() try to encode it as an 8-bit string
then. But unicode.__mod__ says "oh, cool -- I'm done".
> Also, I'm a little unclear on the purpose of the __unicode__ method.
> If you can return unicode from __str__ then why would I want to
> provide a __unicode__ method?
Is the purpose clearer if you purge your mind of the belief that str()
(as opposed to __str__!) can return unicode? Here w/ current CVS:
>>> class A:
... def __str__(self): return u'a'
>>> print A()
a
>>> type(str(A()))
<type 'str'>
>>>
>>> class A:
... def __str__(self): return u'\u1234'
>>> print A()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\u1234' in
position 0: ordinal not in range(128)
>>>
>>> '%s' % A()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\u1234' in
position 0: ordinal not in range(128)
>>> u'%s' % A()
u'\u1234'
>>>
So unicode.__mod__ is what's special here, But not sure that helps <wink>.
More information about the Python-Dev
mailing list