[Python-Dev] Revised PEP 349: Allow str() to return unicode strings

Phillip J. Eby pje at telecommunity.com
Tue Aug 23 19:14:24 CEST 2005


At 10:54 AM 8/23/2005 -0600, Neil Schemenauer wrote:
>On Tue, Aug 23, 2005 at 11:43:02AM -0400, Phillip J. Eby wrote:
> > At 09:21 AM 8/23/2005 -0600, Neil Schemenauer wrote:
> > >> then of course, one could change ``unicode.__str__()`` to return
> > >> ``self``, itself, which should work. but then, why so complicated?
> > >
> > >I think that may be the right fix.
> >
> > No, it isn't.  Right now str(u"x") coerces the unicode object to a
> > string, so changing this will be backwards-incompatible with any
> > existing programs.
>
>I meant that for the implementation of the PEP, changing
>unicode.__str__ to return self seems to be the right fix.  Whether
>you believe that str() should be allowed to return unicode instances
>is a different question.
>
> > I think the new builtin is actually the right way to go for both 2.x and
> > 3.x Pythons.  i.e., text() would be a builtin in 2.x, along with a new
> > bytes() type, and in 3.x text() could replace the basestring, str and
> > unicode types.
>
>Perhaps the critical question is what will the string type in P3k be
>called?  If it will be 'str' then I think the PEP makes sense.  If
>it will be something else, then there should be a corresponding type
>slot (e.g. __text__).  What method does your proposed text()
>built-in call?

Heck if I know.  :)  I think the P3k string type should just be called 
'text', though, so we can leave the whole unicode/str mess behind.


> > I also think that the text() constructor should have a signature of
> > 'text(ob,encoding="ascii")'.
>
>I think that's a bad idea.  We want to get away from ASCII and use
>Unicode instead.

It's not str-stable if it returns unicode for a string input.


> > In the default case, strings can be returned by text() as long as
> > they are pure ASCII (making the code str-stable *and*
> > unicode-safe).
>
>I think you misunderstand the PEP.  Your proposed function is
>neither Unicode-safe nor str-stable, the worst of both worlds.
>Passing it a unicode string that contains non-ASCII characters would
>result in an exception (not Unicode-safe).  Passing it a str results
>in a unicode return value (not str-stable).

I think you misunderstand my proposal.  :)  I'm proposing rough semantics of:

     def text(ob, encoding='ascii'):

         if isinstance(ob,unicode):
             return ob

         ob = str(ob)  # or ob.__text__, then fallback to __unicode__/__str__

         if encoding=='ascii' and isinstance(ob,str):
             unicode(ob,encoding)  # check for purity
             return ob  # return the string if it's pure

         return unicode(ob, encoding)

This is str-stable *and* unicode-safe.




More information about the Python-Dev mailing list