[Python-Dev] Unicode debate

Ka-Ping Yee ping@lfw.org
Wed, 3 May 2000 01:50:30 -0700 (PDT)


On Wed, 3 May 2000, Fredrik Lundh wrote:
> Guido van Rossum <guido@python.org> wrote:
> > But there must be a way to turn on Unicode-awareness on e.g. stdout
> > and then printing a Unicode object should not use str() (as it
> > currently does).
> 
> to throw some extra gasoline on this, how about allowing
> str() to return unicode strings?

You still need to *print* them somehow.  One way or another,
stdout is still just a stream with bytes on it, unless we
augment file objects to understand encodings.

stdout sends bytes to something -- and that something will
interpret the stream of bytes in some encoding (could be
Latin-1, UTF-8, ISO-2022-JP, whatever).  So either:

    1.  You explicitly downconvert to bytes, and specify
        the encoding each time you do.  Then write the
        bytes to stdout (or your file object).

    2.  The file object is smart and can be told what
        encoding to use, and Unicode strings written to
        the file are automatically converted to bytes.

Another thread mentioned having separate read/write and
binary_read/binary_write methods on files.  I suggest
doing it the other way, actually: since read/write operate
on byte streams now, *they* are the binary operations;
the new methods should be the ones that do the extra
encoding/decoding work, and could be called uniread/uniwrite,
uread/uwrite, textread/textwrite, etc.

> (extra questions: how about renaming "unicode" to "string",
> and getting rid of "unichr"?)

Would you expect chr(x) to return an 8-bit string when x < 128,
and a Unicode string when x >= 128?


-- ?!ng