[Python-Dev] re: Unicode as argument for 8-bit strings

Fredrik Lundh fredrik@pythonware.com
Sat, 8 Apr 2000 07:47:13 +0200


Bill Tutt wrote:
> > There has been a bug report about the treatment of Unicode
> > objects together with 8-bit format strings. The current
> > implementation converts the Unicode object to UTF-8 and then
> > inserts this value in place of the %s....=20
> >=20
> > I'm inclined to change this to have '...%s...' % u'abc'
> > return u'...abc...' since this is just another case of
> > coercing data to the "bigger" type to avoid information loss.
> >=20
> > Thoughts ?
>=20
> Suddenly returning a Unicode string from an operation that was an =
8-bit
> string is likely to give some code exterme fits of despondency.

why is this different from returning floating point values from
operations involving integers and floats?

> Converting to UTF-8 didn't give you any data loss, however it =
certainly
> might be unexpected to now find UTF-8 characters in what the user =
originally
> thought was a binary string containing whatever they had wanted it to =
contain.

the more I've played with this, the stronger my opinion that
the "now it's an ordinary string, now it's a UTF-8 string, now
it's an ordinary string again" approach doesn't work.  more on
this in a later post.

(am I the only one here that has actually tried to write code
that handles both unicode strings and ordinary strings?  if not,
can anyone tell me what I'm doing wrong?)

> Throwing an exception would at the very least force the user to make a
> decision one way or the other about what they want to do with the =
data.
> They might want to do a codepage translation, or something else. (aka =
Hey,
> here's a bug I just found for you!)

> In what other cases are you suddenly returning a Unicode string object =
from
> which previouslly returned a string object?

if unicode is ever to be a real string type in python, and not just a
nifty extension type, it must be okay to return a unicode string from
any operation that involves a unicode argument...

</F>