Neil Schemenauer wrote:
On Wed, Mar 09, 2005 at 11:10:59AM +0100, M.-A. Lemburg wrote:
The patch implements the PyObjbect_Text() idea (an API that returns a basestring instance, ie. string or unicode) and then uses this in '%s' (the string version) to properly propogate to u'%s' (the unicode version).
Maybe we should also expose the C API as suggested in the patch, e.g. as text(obj).
Perhaps the right thing to do is introduce a new format code that means insert text(obj) instead of str(obj), e.g %t. If we do that though then we should make "'%s' % u'xyz'" return a string instead of a unicode object. I suspect that would break a lot of code.
It would result in lots of UnicodeErrors due to failing conversion of the Unicode string to a string. Plus it would break with the general rule of always coercing to Unicode (see below) and lose us the ability to write polymorphic code.
OTOH, having %s mean text(obj) instead of str(obj) may work just fine. People who want it to mean str() generally don't have any unicode strings floating around so text() has the same effect. People who are using unicode probably would find text() to be more useful behavior. I think that's why someone hacked PyString_Format to sometimes return unicode strings.
That wasn't a hack: it's part of the Unicode integration logic which always coerces to Unicode if strings and Unicode meet. In the above case a string format string meets a Unicode object as argument which then results in a Unicode object to be returned.
Regarding the use of __str__, to return a unicode object: we could introduce a new slot (e.g. __text__) instead. However, I can't see any advantage to that. If someone really wants a str object then they call str() or PyObject_Str().