[Python-Dev] Unicode debate

Fredrik Lundh Fredrik Lundh" <effbot@telia.com
Wed, 3 May 2000 12:31:34 +0200


Ka-Ping Yee <ping@lfw.org> wrote:
> > to throw some extra gasoline on this, how about allowing
> > str() to return unicode strings?
>=20
> You still need to *print* them somehow.  One way or another,
> stdout is still just a stream with bytes on it, unless we
> augment file objects to understand encodings.
>=20
> stdout sends bytes to something -- and that something will
> interpret the stream of bytes in some encoding (could be
> Latin-1, UTF-8, ISO-2022-JP, whatever).  So either:
>=20
>     1.  You explicitly downconvert to bytes, and specify
>         the encoding each time you do.  Then write the
>         bytes to stdout (or your file object).
>=20
>     2.  The file object is smart and can be told what
>         encoding to use, and Unicode strings written to
>         the file are automatically converted to bytes.

which one's more convenient?

(no, I won't tell you what I prefer. guido doesn't want
more arguments from the old "characters are characters"
proponents, so I gotta trick someone else to spell them
out ;-)

> > (extra questions: how about renaming "unicode" to "string",
> > and getting rid of "unichr"?)
>=20
> Would you expect chr(x) to return an 8-bit string when x < 128,
> and a Unicode string when x >=3D 128?

that will break too much existing code, I think.  but what
about replacing 128 with 256?

</F>