[Python-Dev] __str__ and unicode

Nick Coghlan ncoghlan at gmail.com
Wed Dec 6 13:07:43 CET 2006


M.-A. Lemburg wrote:
> On 2006-12-06 10:26, Fredrik Lundh wrote:
>> From what I can tell, __str__ may return a Unicode object, but
>> only if can be converted to an 8-bit string using the default encoding.  Is this
>> on purpose or by accident?  Do we have a plan for improving the situation
>> in future 2.X releases ?

It has worked that way since at Python least 2.4 (I just tried returning 
unicode from __str__ in 2.4.1 and it worked fine). That's the oldest version I 
have handy, so I don't know if it was possible in earlier versions.

> This was added to make the transition to all Unicode in 3k easier:
> 
> .__str__() may return a string or Unicode object.
> 
> .__unicode__() must return a Unicode object.
> 
> There is no restriction on the content of the Unicode string
> for .__str__().

It's also the basis for a tweak that was made in 2.5 to permit conversion to a 
builtin string in a way that is idempotent for both str and unicode instances via:

   as_builtin_string = '%s' % original

To use the terms from the deferred PEP 349, that conversion mechanism is both 
Unicode-safe (unicode doesn't get coerced to str) and str-stable (str doesn't 
get coerced to unicode).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org


More information about the Python-Dev mailing list