[Python-Dev] unicode inconsistency?

Tim Peters tim.peters at gmail.com
Thu Sep 9 22:59:18 CEST 2004


[Tim]
>> '%s' is documented as "String (converts any python object using
>> str())".  It's str(A()) that raises the exception you're seeing,
>> not interpolation.

[Neil]
> Shouldn't '%s' % u'\u1234' also raise an exception then?

Yes, but the existence of one undocumented extension isn't sufficient
reason to multiply them.  The "Unicode exception" here is at least
easy to explain.  To make your case work, we somehow have to explain
that although virtually all ways of invoking __str__ produce an 8-bit
encoding of a unicode return value, for some magical reason
str.__mod__ does not.  The existing "Unicode exception" consists
solely of saying "but unicode inputs don't invoke str(), and force the
interpolation to get passed to unicode.__mod__ instead".

> Yes.  I want something like "PyObject_UnicodeOrStr" that would
> return either a unicode object or a str object.  That would make it
> easier to write code that produces 'str' results if unicode
> characters don't appear in any of the inputs.

I think biting the Unicode bullet whole is saner, but suit yourself.

>  Having __str__ methods that can return either 'unicode' or 'str' objects
> is also very handy (I don't see how you can say that it doesn't make any
> sense).

Didn't we go thru that last week <wink>?  Yes:

    [Neil]
    [... the same class as today's class ...]

    [Martin]
    > This class is incorrect: it does not support str().

    [Neil]
    > Can you be more specific about what is incorrect with the above
    > class?

    [Martin]
    In the default installation, it gives a UnicodeEncodeError.

You didn't respond to that (at least not that I saw), so I assumed you
accepted Martin's nag.  Having a __str__ that returns a unicode object
that the default encoding can't handle is clearly (IMO) begging for
trouble.

> Perhaps I am on the wrong track.  However, if I understand the /F
> bot correctly, he favours a design that does not force everthing to
> unicode strings.

Saying it doesn't make sense to have a __str__ method return a Unicode
value that can't be encoded *as* a str isn't asking anyone to force
anything to Unicode.  __str__ is still trying hard to retain a
*distinction* between str and unicode.  PyObject_Unicode() no longer
plays along with that distinction, but I (mildly) wish it still did.


More information about the Python-Dev mailing list