[docs] [issue9196] Improve docs for string interpolation "%s" re Unicode strings

Ezio Melotti report at bugs.python.org
Fri Jan 21 04:47:32 CET 2011

Ezio Melotti <ezio.melotti at gmail.com> added the comment:

Python 3 checks the return types of __bytes__ and __str__, raising an error if it's not bytes and str respectively:
>>> str(C())
TypeError: __str__ returned non-string (type bytes)
>>> bytes(C())
TypeError: __bytes__ returned non-bytes (type str)

The Python 2 doc for unicode() says[0]:
For objects which provide a __unicode__() method, it will call this method without arguments to create a Unicode string. For all other objects, the 8-bit string version or representation is requested and then converted to a Unicode string using the codec for the default encoding in 'strict' mode.

The doc for .__unicode__() says[1]:
Called to implement unicode() built-in; should return a Unicode object. When this method is not defined, string conversion is attempted, and the result of string conversion is converted to Unicode using the system default encoding.
This is consistent with unicode() doc (but it doesn't mention that 'strict' is used).  It also says that the method *should* return unicode, but it can also returns a str that gets coerced by unicode().

The doc for .__str__() says[2]:
Called by the str() built-in function and by the print statement to compute the “informal” string representation of an object. [...] The return value must be a string object.
This is wrong because the return value can be unicode too (this has been changed at some point, it used to be true on older versions).

That said, some of the behaviors described by Craig (e.g. __str__ that returns unicode) are not documented and documenting them might save some confusion. However these "weird" behaviors are most likely errors and the fact that there are no exception is just because Python 2 is not strict with str/unicode.

I think a better way to solve the problem is to document clearly how these methods should be used (i.e. if __unicode__ should be preferred over __str__, if it's necessary to implement both, what they should return, etc.).

[0]: http://docs.python.org/library/functions.html#unicode
[1]: http://docs.python.org/reference/datamodel.html#object.__unicode__
[2]: http://docs.python.org/reference/datamodel.html#object.__str__


Python tracker <report at bugs.python.org>

More information about the docs mailing list