[Python-Dev] __str__ vs. __unicode__

Walter Dörwald walter at livinglogic.de
Wed Jan 19 10:40:46 CET 2005


M.-A. Lemburg wrote:

> Walter Dörwald wrote:
> 
>> __str__ and __unicode__ seem to behave differently. A __str__
>> overwrite in a str subclass is used when calling str(), a __unicode__
>> overwrite in a unicode subclass is *not* used when calling unicode():
>>
>> [...]
> 
> If you drop the base class for unicode, this already works.

That's cheating! ;)

My use case is an XML DOM API: __unicode__() should extract the
character data from the DOM. For Text nodes this is the text,
for comments and processing instructions this is u"" etc. To
reduce memory footprint and to inherit all the unicode methods,
it would be good if Text, Comment and ProcessingInstruction could
be subclasses of unicode.

> This code in object.c:PyObject_Unicode() is responsible for
> the sub-class version not doing what you'd expect:
> 
>     if (PyUnicode_Check(v)) {
>         /* For a Unicode subtype that's not a Unicode object,
>            return a true Unicode object with the same data. */
>         return PyUnicode_FromUnicode(PyUnicode_AS_UNICODE(v),
>                          PyUnicode_GET_SIZE(v));
>     }
> 
> So the question is whether conversion of a Unicode sub-type
> to a true Unicode object should honor __unicode__ or not.
> 
> The same question can be asked for many other types, e.g.
> floats (and __float__), integers (and __int__), etc.
> 
>  >>> class float2(float):
> ...     def __float__(self):
> ...             return 3.141
> ...
>  >>> float(float2(1.23))
> 1.23
>  >>> class int2(int):
> ...     def __int__(self):
> ...             return 42
> ...
>  >>> int(int2(123))
> 123
> 
> I think we need general consensus on what the strategy
> should be: honor these special hooks in conversions
> to base types or not ?

I'd say, these hooks should be honored, because it gives
us more possibilities: If you want the original value,
simply don't implement the hook.

> Maybe the string case is the real problem ... :-)

At least it seems that the string case is the exception.

So if we fix __str__ this would be a bugfix for 2.4.1.
If we fix the rest, this would be a new feature for 2.5.

Bye,
    Walter Dörwald


More information about the Python-Dev mailing list