[Python-Dev] String formatting / unicode 2.5 bug?
John J Lee
jjl at pobox.com
Sun Aug 20 14:45:08 CEST 2006
On Sun, 20 Aug 2006, Nick Coghlan wrote:
> John J Lee wrote:
>> Is this a bug?
>
> I don't believe so - the string formatting documentation states that the
> result will be unicode if either the format string is unicode or any of the
> objects passed to a %s format code is unicode.
>
> That latter part has just been extended to include any object that returns
> Unicode from __str__, instead of being restricted to actual Unicode
> instances.
>
> Note that the following behaves the same way regardless of whether you use
> 2.4 or 2.5:
> "%s" % 'hi'
> "%s" % u'hi'
Given that, the following wording should be changed:
http://docs.python.org/lib/typesseq-strings.html
Conversion Meaning Notes
...
s String (converts any python object using str()). (4)
...
(4) If the object or format provided is a unicode string, the resulting
string will also be unicode.
The note (4) says that the result will be unicode, but it doesn't say how,
in this case, that comes about. This case is confusing because the docs
claim string formatting with %s "converts ... using str()", and yet
str(a()) returns a bytestring. Does it *really* use str, or just __str__?
Surely the latter? (given the observed behaviour, and not reading the C
source)
FWIW, this change broke epydoc (fails with an AssertionError -- so perhaps
without the assert it would still "work", dunno).
> And once the result has been promoted to unicode, __unicode__ is used
> directly:
>
>> > > print repr("%s%s" % (a(), a()))
> __str__
> accessing <__main__.a object at 0x00AF66F0>.__unicode__
> __str__
> accessing <__main__.a object at 0x00AF6390>.__unicode__
> __str__
> u'hihi'
I don't understand this part. Why is __unicode__ called? Your example
doesn't appear to show this happening "once [i.e., because?] the result
has been promoted to unicode" -- if that were true, it would "stand to
reason" <wink> that the interpreter would then conclude it should call
__unicode__ for all remaining %s, and not bother with __str__. If OTOH
__unicode__ is called because __str__ returned a unicode object, it makes
(very slightly) more sense that it goes through the same
__str__-then-__unicode__ rigmarole for each object on the RHS of the %.
But none of that seems to make a huge amount of sense. I've now found the
September 2004 discussion of this, and I'm none the wiser.
John
More information about the Python-Dev
mailing list