[Python-Dev] Revised PEP 349: Allow str() to return unicode strings
Phillip J. Eby
pje at telecommunity.com
Tue Aug 23 19:14:24 CEST 2005
At 10:54 AM 8/23/2005 -0600, Neil Schemenauer wrote:
>On Tue, Aug 23, 2005 at 11:43:02AM -0400, Phillip J. Eby wrote:
> > At 09:21 AM 8/23/2005 -0600, Neil Schemenauer wrote:
> > >> then of course, one could change ``unicode.__str__()`` to return
> > >> ``self``, itself, which should work. but then, why so complicated?
> > >
> > >I think that may be the right fix.
> >
> > No, it isn't. Right now str(u"x") coerces the unicode object to a
> > string, so changing this will be backwards-incompatible with any
> > existing programs.
>
>I meant that for the implementation of the PEP, changing
>unicode.__str__ to return self seems to be the right fix. Whether
>you believe that str() should be allowed to return unicode instances
>is a different question.
>
> > I think the new builtin is actually the right way to go for both 2.x and
> > 3.x Pythons. i.e., text() would be a builtin in 2.x, along with a new
> > bytes() type, and in 3.x text() could replace the basestring, str and
> > unicode types.
>
>Perhaps the critical question is what will the string type in P3k be
>called? If it will be 'str' then I think the PEP makes sense. If
>it will be something else, then there should be a corresponding type
>slot (e.g. __text__). What method does your proposed text()
>built-in call?
Heck if I know. :) I think the P3k string type should just be called
'text', though, so we can leave the whole unicode/str mess behind.
> > I also think that the text() constructor should have a signature of
> > 'text(ob,encoding="ascii")'.
>
>I think that's a bad idea. We want to get away from ASCII and use
>Unicode instead.
It's not str-stable if it returns unicode for a string input.
> > In the default case, strings can be returned by text() as long as
> > they are pure ASCII (making the code str-stable *and*
> > unicode-safe).
>
>I think you misunderstand the PEP. Your proposed function is
>neither Unicode-safe nor str-stable, the worst of both worlds.
>Passing it a unicode string that contains non-ASCII characters would
>result in an exception (not Unicode-safe). Passing it a str results
>in a unicode return value (not str-stable).
I think you misunderstand my proposal. :) I'm proposing rough semantics of:
def text(ob, encoding='ascii'):
if isinstance(ob,unicode):
return ob
ob = str(ob) # or ob.__text__, then fallback to __unicode__/__str__
if encoding=='ascii' and isinstance(ob,str):
unicode(ob,encoding) # check for purity
return ob # return the string if it's pure
return unicode(ob, encoding)
This is str-stable *and* unicode-safe.
More information about the Python-Dev
mailing list