
over at my work copy of the python language reference, Adrian Holovaty asked about the exact semantics of the __str__ hook: http://effbot.org/pyref/__str__ "The return value must be a string object." Does this mean it can be a *Unicode* string object? This distinction is ambiguous to me because unicode objects and string objects are both subclasses of basestring. May a __str__() return a Unicode object? I seem to remember earlier discussions on this topic, but don't recall when and what. From what I can tell, __str__ may return a Unicode object, but only if can be converted to an 8-bit string using the default encoding. Is this on purpose or by accident? Do we have a plan for improving the situation in future 2.X releases ? </F>

On 2006-12-06 10:26, Fredrik Lundh wrote:
over at my work copy of the python language reference, Adrian Holovaty asked about the exact semantics of the __str__ hook:
http://effbot.org/pyref/__str__
"The return value must be a string object." Does this mean it can be a *Unicode* string object? This distinction is ambiguous to me because unicode objects and string objects are both subclasses of basestring. May a __str__() return a Unicode object?
I seem to remember earlier discussions on this topic, but don't recall when and what. From what I can tell, __str__ may return a Unicode object, but only if can be converted to an 8-bit string using the default encoding. Is this on purpose or by accident? Do we have a plan for improving the situation in future 2.X releases ?
This was added to make the transition to all Unicode in 3k easier: .__str__() may return a string or Unicode object. .__unicode__() must return a Unicode object. There is no restriction on the content of the Unicode string for .__str__(). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 06 2006)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

On 2006-12-06 10:46, M.-A. Lemburg wrote:
On 2006-12-06 10:26, Fredrik Lundh wrote:
over at my work copy of the python language reference, Adrian Holovaty asked about the exact semantics of the __str__ hook:
http://effbot.org/pyref/__str__
"The return value must be a string object." Does this mean it can be a *Unicode* string object? This distinction is ambiguous to me because unicode objects and string objects are both subclasses of basestring. May a __str__() return a Unicode object?
I seem to remember earlier discussions on this topic, but don't recall when and what. From what I can tell, __str__ may return a Unicode object, but only if can be converted to an 8-bit string using the default encoding. Is this on purpose or by accident? Do we have a plan for improving the situation in future 2.X releases ?
This was added to make the transition to all Unicode in 3k easier:
.__str__() may return a string or Unicode object.
.__unicode__() must return a Unicode object.
There is no restriction on the content of the Unicode string for .__str__().
One more thing, since these two hooks are commonly used with str() and unicode(): * unicode(obj) will first try .__unicode() and then revert to .__str__() (possibly converting the string return value to Unicode) * str(obj) will try .__str__() only (possibly converting the Unicode return value to a string using the default encoding) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 06 2006)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

On 2006-12-06 10:56, Fredrik Lundh wrote:
M.-A. Lemburg wrote:
This was added to make the transition to all Unicode in 3k easier:
thanks for the clarification.
do you recall when this was added? 2.5?
Not really, only that it was definitely before 2.5. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 06 2006)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

M.-A. Lemburg wrote:
On 2006-12-06 10:26, Fredrik Lundh wrote:
From what I can tell, __str__ may return a Unicode object, but only if can be converted to an 8-bit string using the default encoding. Is this on purpose or by accident? Do we have a plan for improving the situation in future 2.X releases ?
It has worked that way since at Python least 2.4 (I just tried returning unicode from __str__ in 2.4.1 and it worked fine). That's the oldest version I have handy, so I don't know if it was possible in earlier versions.
This was added to make the transition to all Unicode in 3k easier:
.__str__() may return a string or Unicode object.
.__unicode__() must return a Unicode object.
There is no restriction on the content of the Unicode string for .__str__().
It's also the basis for a tweak that was made in 2.5 to permit conversion to a builtin string in a way that is idempotent for both str and unicode instances via: as_builtin_string = '%s' % original To use the terms from the deferred PEP 349, that conversion mechanism is both Unicode-safe (unicode doesn't get coerced to str) and str-stable (str doesn't get coerced to unicode). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Dec 6, 2006, at 7:07 AM, Nick Coghlan wrote:
M.-A. Lemburg wrote:
On 2006-12-06 10:26, Fredrik Lundh wrote:
From what I can tell, __str__ may return a Unicode object, but only if can be converted to an 8-bit string using the default encoding. Is this on purpose or by accident? Do we have a plan for improving the situation in future 2.X releases ?
It has worked that way since at Python least 2.4 (I just tried returning unicode from __str__ in 2.4.1 and it worked fine). That's the oldest version I have handy, so I don't know if it was possible in earlier versions.
I don't have anything older than 2.4 laying around either, but IIRC in 2.3 unicode() did not call __unicode__(). - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRXbBIHEjvBPtnXfVAQJC6AQAhPDrd451PYhQHTuFZqFX7oJpuadEONxb UaBEpWs6yzJjLAxC2tfRVT8vOc1bTmF3Wzf1y5HZsXcbklOFm3USl0YJ8206oDBN 2MGGf2e/JuC5oajo5RJqQ/oqaLDSHb8cD6GP2y/+FFaAhwDnlgnOlV0TxAggKv4K a9nCnFRwJ8c= =+mj8 -----END PGP SIGNATURE-----

I don't have anything older than 2.4 laying around either, but IIRC in 2.3 unicode() did not call __unicode__().
It turns out __unicode__() is called on Python 2.3.5. % python2.3 Python 2.3.5 (#2, Oct 18 2006, 23:04:45) [GCC 4.1.2 20061015 (prerelease) (Debian 4.1.1-16.1)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
class Foo(object): ... def __unicode__(self): ... print "unicode" ... return u"hi" ... def __str__(self): ... print "str" ... return "hello" ... unicode(Foo()) unicode u'hi'
-- Michael Urman http://www.tortall.net/mu/blog

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Dec 6, 2006, at 9:15 AM, Michael Urman wrote:
I don't have anything older than 2.4 laying around either, but IIRC in 2.3 unicode() did not call __unicode__().
It turns out __unicode__() is called on Python 2.3.5.
Ah cool, thanks. I must be misremembering that about some earlier Python. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRXbcenEjvBPtnXfVAQIZWAP/aRXJ1Rq0eMVWhYOYvP6Wdy2v5DCj0Arl yS3n0RURWJj5i+zYtqQUBIJEOcFSLJ69cb1SWl/KTvedI4y0SBQknX0o8EJaYhSU h1Y2gL2X+QnvJxlCf7PCdm2C1jYQgwAmKuebjCwaMPJYBqW9Z27+oSTsyFFM/mPR 2qx++VsRw68= =ABvs -----END PGP SIGNATURE-----
participants (5)
-
Barry Warsaw
-
Fredrik Lundh
-
M.-A. Lemburg
-
Michael Urman
-
Nick Coghlan