[Python-Dev] unicode and __str__
M.-A. Lemburg
mal at egenix.com
Tue Aug 31 10:23:33 CEST 2004
Neil Schemenauer wrote:
> With Python 2.4:
>
> >>> u = u'\N{WHITE SMILING FACE}'
> >>> class A:
> ... def __str__(self):
> ... return u
> ...
> >>> class B:
> ... def __unicode__(self):
> ... return u
> ...
> >>> u'%s' % A()
> u'\u263a'
> >>> u'%s' % B()
> u'\u263a'
>
> With Python 2.3:
>
> >>> u'%s' % A()
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> UnicodeEncodeError: 'ascii' codec can't encode character u'\u263a' in
> position 0: ordinal not in range(128)
> >>> u'%s' % B()
> u'<__main__.B instance at 0x401f910c>'
>
> The only thing I found in the NEWS file that seemed relevant is
> this note:
>
> u'%s' % obj will now try obj.__unicode__() first and fallback to
> obj.__str__() if no __unicode__ method can be found.
>
> I don't think that describes the behavior difference. Allowing
> __str__ return unicode strings seems like a pretty noteworthy
> change (assuming that's what actually happened).
__str__ is indeed allowed to return Unicode objects
(and has been for quite a while).
The reason we added __unicode__ was to provide a hook for
PyObject_Unicode() to try before reverting to __str__. It is
needed because even though returning Unicode objects from
__str__ is allowed, in most cases PyObject_Str() gets to talk
to it and this API always converts Unicode to a string using
the default encoding which can easily fail.
> Also, I'm a little unclear on the purpose of the __unicode__ method.
> If you can return unicode from __str__ then why would I want to
> provide a __unicode__ method? Perhaps it is meant for objects that
> can either return a unicode or a string representation depending on
> what the caller prefers. I have a hard time imagining a use for
> that.
That's indeed the use case. An object might want to return
an approximate string representation in some form if ask for
a string, but a true content representation when asked for
Unicode. Because of the default encoding problems you might
run into with __str__, we need two slots to provide this kind of
functionality.
In Py3k we will probably see __str__ and __unicode__ reunite.
Now back to your original question: the change you see
in %-formatting was actually a bug fix. Python 2.3 should
have exposed the same behavior as 2.4 does now.
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Aug 31 2004)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
More information about the Python-Dev
mailing list