[Python-Dev] Re: Type-converting functions, esp. unicode() vs. unistr()

Guido van Rossum guido@digicool.com
Thu, 18 Jan 2001 20:04:22 -0500


> Ka-Ping Yee wrote:
> > 
> > On Thu, 18 Jan 2001, Ka-Ping Yee wrote:
> > >     str() looks for __str__
> > 
> > Oops.  I forgot that
> > 
> >       str() looks for __str__, then tries __repr__
> > 
> > So, presumably,
> > 
> >       unicode() should look for __unicode__, then __str__, then __repr__
> 
> Not quite... str() does this:
> 
> 1. strings are passed back as-is
> 2. the type slot tp_str is tried
> 3. the method __str__ is tried
> 4. Unicode returns are converted to strings
> 5. anything other than a string return value is rejected
> 
> unistr() does the same, but makes sure that the return
> value is an Unicode object.
> 
> unicode() does the following:
> 
> 1. for instances, __str__ is called
> 2. Unicode objects are returned as-is
> 3. string objects or character buffers are used as basis for decoding
> 4. decoding is applied to the character buffer and the results
>    are returned
> 
> I think we should perhaps merge the two approaches into one
> which then applies all of the above in unicode() (and then
> forget about unistr()). This might lose hide some type errors,
> but since all other generic constructors behave more or less
> in the same way, I think unicode() should too.

Yes, I would like to see these merged.  I noticed that e.g. there is
special code to compare Unicode strings in the comparison code (I
think I *could* get rid of this now we have rich comparisons, but I
decided to put that off), and when I looked at it it uses the same set
of conversions as unicode().  Some of these seem questionable to me --
why do you try so many ways to get a string out of an object?  (On the
other hand the merge of unicode() and unistr() might have this effect
anyway...)

--Guido van Rossum (home page: http://www.python.org/~guido/)